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Preface 



This volume contains 18 papers that were presented at the Eighth Asian Com- 
puting Science Conference (ASIAN 2003) in Mumbai in December 2003. The 
theme of the conference this year was programming languages and distributed 
computation. Papers were invited on all aspects of theory, practice and applica- 
tions related to this theme. 

The Program Committee invited Greg Morrisett to give the keynote talk 
(in a joint session with the International Conference on Logic Programming). 
Andrew Birrell and Mark S. Miller were also invited to give talks. The Program 
Committee selected 16 papers out of the 53 submitted. Together, these papers 
were authored by 48 people from 11 countries. 

I thank the Program Committee for doing an outstanding job under severe 
time pressure, and the Executive Committee for inviting me to chair the Program 
Committee. 

I also thank the sponsoring institutions for this conference: 

Asian Institute of Technology 

Institut National de Recherche en Informatique et en Automatique 
United Nations University/International Institute for Software Technology 
National University of Singapore 
Waseda University 

Tata Institute of Fundamental Research 
The Pennsylvania State University 
IBM TJ Watson Research Lab 



Mountain Lakes, NJ 
September 2003 
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Achieving Type Safety for Low-Level Code 



Greg Morrisett 

Cornell University, Ithaca NY 14853, USA 



Abstract. Type-safe, high-level langnages snch as Java ensure that a 
wide class of failures, including buffer overruns and format string at- 
tacks, simply cannot happen. Unfortunately, our computing infrastruc- 
ture is built with type-unsafe low-level languages such as C, and it is 
economically impossible to throw away our existing operating systems, 
databases, routers, etc. and re-code them all in Java. 

Fortunately, a number of recent advances in static analysis, language 
design, compilation, and run-time systems have given us a set of tools 
for achieving type safety for legacy C code. In this talk, I will survey 
some of the progress that has been made in the last few years, and focus 
on the issues that remain if we are to achieve type safety, and more 
generally, security for our computing infrastructure. 



1 Overview 

In November of 1988, Robert Morris, Jr. released a worm into the Internet. One 
of the ways the worm propagated was by overflowing an input buffer for the 
gets routine of the finger daemon. The worm also took advantage of a flaw in 
the configuration of the sendmail program, where remote debugging was enabled 
by default, as well as a password cracking program. Based on infection rates at 
MIT, some have concluded that roughly 10% of the supposedly 60,000 machines 
connected to the Internet were infected by the worm over a period of a few days. 
During those few days, heroic engineers disassembled the worm, figured out what 
it was doing, and took steps to block its propagation by pushing out changes to 
various sites [1]. 

Not much has changed in 15 years, except that there are many more hosts 
on the Internet, we are more dependent on these hosts, and the worms are a lot 
faster. For instance, in January 2003, someone released the so-called “Sapphire” 
worm (also known as the SQL-Slammer worm) which took advantage of a buffer 
overrun in Microsoft’s SQL servers. This worm doubled in size every 8.5 seconds 
and managed to traverse and infect about 90% of the 100,000 susceptible hosts 
on the Internet in about 10 minutes [2]. There simply wasn’t time to determine 
what the worm was doing, construct a patch, and get the patch in place before 
the whole Internet had been hit. In fact, a patch to prevent the flaw Sapphire 
took advantage of had been out for a number of months, but many users had 
failed to apply the patch, perhaps in fear that the cure was worse than the 
poison, or perhaps out of simple laziness. 

In the case of the Blaster worm, released in August of 2003 and which took 
advantage of a buffer overrun in Windows 2000 and XP, a counter-worm was 
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released in a vain attempt to patch machines. Unfortunately, it managed to clog 
networks in the same way that Blaster did. At Cornell University, about 1,000 
out of a roughly 30,000 Windows machines were infected by Blaster and even 
though the infection rate was so low, we estimate that it has cost about $133,000 
in IT time so far to contain and deal with the damage. Other universities have 
reported as much as $800,000 in IT costs [3]. 

Security and reliability are hard issues. Misconfiguration, social issues, and 
many other problems will not be solved by technology alone. But surely after 15 
years, we should be able to prevent buffer overruns and other “simple” errors in 
a proactive fashion before we ship code. In this talk, I will survey some of the 
approaches that people have used or are proposing for addressing buffer overruns 
and related problems in legacy C and C++ code. 

References 

1. Spafford, E. The Internet Worm Program: An Analysis. Purdue Technical Report 
CSD-TR-823, 1988. 
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Spread of the Sapphire/Slammer Worm. 
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System Protected by a Type Theory 
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Abstract. Traditional operating systems protect themselves from user 
programs with a privilege level facility of CPUs. One problem of the 
protection-by-hardware approach is that system calls become very slow 
because heavy operations are required to safely switch the privilege levels 
of user programs. To solve the problem, we design an operating system 
that protects itself with a type theory. In our approach, user programs 
are written in a typed assembly language and the kernel performs type- 
checking before executing the programs. Then, the user programs can be 
executed in the kernel mode, because the kernel knows that the type- 
checked programs do not violate safety of the kernel. Thus, system calls 
become mere function calls and can be invoked very quickly. We imple- 
mented Kernel Mode Linux (KML) that realizes our approach. Several 
benchmarks show effectiveness of KML. 



1 Introduction 

One problem of traditional operating systems is that system calls are very slow. 
This is because they protect themselves from user programs by using hardware 
facilities of CPUs. For example, the Linux kernel [8] for the IA-32 CPUs [7] pro- 
tects itself by using a memory protection facility integrated with a privilege-level 
facility of the CPUs. The kernel runs in the kernel mode, the most privileged 
level, and user programs run in the user mode, the least privileged level. System 
calls are implemented by using a software interruption mechanism of the CPUs 
that can raise a privilege level in a safe and restricted way. This software inter- 
ruption and associated context switches require heavy and complex operations. 
For example, on the recent Pentium 4 CPU of the IA-32, the software inter- 
ruption and the context switches are about 132 times slower than an ordinary 
function call. Recent Linux kernels for the IA-32 CPUs, in fact, use a pair of 
special instructions, sysenter and sysexit, for fast invocation of system calls. But 
this is still 36 times slower than an ordinary function call. 

The obvious way to accelerate system calls is to execute user programs in the 
kernel mode. Then system calls can be handled very quickly because no software 
interruptions and context switches are needed because user programs can access 
the kernel directly. However, if we naively execute user programs in the kernel 
mode, safety of the kernel is totally lost, because the user programs can perform 
any privileged action in the kernel mode. 
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In this paper, we propose an approach for protecting an operating system 
kernel from user programs not with the traditional hardware protection facilities, 
but with static type-checking. In our approach, user programs are written in a 
typed assembly language (TAL) [12], which is an ordinary assembly language 
(except for being typed) that can ensure type safety of programs at the level of 
machine instructions. Then, the kernel performs type-checking before executing 
them. If the programs are successfully type-checked, the kernel can safely execute 
them because the kernel knows that the programs never perform an illegal access 
to its memory. Moreover, in this paper, we show that the type-checking can 
ensure safety of the kernel at the same level as the traditional protection-by- 
hardware approach. 

Based on our approach, we implemented an operating system, called Kernel 
Mode Linux (KML), that can execute user programs in the kernel mode. The no- 
table feature of KML is that user programs are executed as ordinary processes of 
the original Linux kernel (of course, except for their privilege levels). That is, the 
memory paging and the scheduling of processes are performed as usual. There- 
fore, user programs that consume very large memory or enter an infinite loop can 
be safely executed in the kernel mode. We also conducted several benchmarks 
on KML and the result shows that system calls are invoked very fast on KML 

The rest of this paper is organized as follows. Sect. 2 formally describes 
that the hardware protection can be replaced with a type-based static program 
analysis without losing safety. Sect. 3 describes the implementation of Kernel 
Mode Linux. Sect. 4 presents the result of some performance benchmarks. Sect. 5 
mentions related work. Sect. 6 concludes this paper. 

2 Formal Arguments 

To show that the protection-by-hardware approach can be replaced with static 
type-checking, we first define an idealized abstract machine that has a notion of 
privilege levels of CPUs and system calls. Then, we show that the protection- 
by-hardware approach of traditional operating systems actually ensures safety of 
the kernel. Next, we define our typed assembly language for the abstract machine 
and show that its static type-checking ensures safety of the kernel. The following 
argument is almost the same line of the arguments of TAL [12,11], FTAL [5], 
TALT [3] etc. The fundamental difference between the previous approaches and 
ours is that our abstract machine has an explicit notion of privilege levels and 
an operating system kernel. 

2.1 Abstract Machine 

Figure 1 defines the syntax of states of the abstract machine. The machine state S 
is defined as a tuple of code memory, data memory, a register file, a program 
counter, a privilege level and a kernel. The code memory C represents execute- 
only memory for instructions. The data memory D represents mutable memory 
for data. The register file R is defined as a map from registers to word values. The 
program counter pc is a word value that points to an instruction that is about to 
be executed. The privilege level p consists of two values: user and kernel. The 
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{Code memory) 
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{0 i-> 


, O30 

to, . . • , 2 — 






{Dom{C) = (n | 



{Data memory) 

{Register file) 
{Kernel) 

{Instruction) 

{State) 



'- 230-1 } 



D :: 



= {2^° l-> W230, 2^1 - 1 I 



W231_i} 

{Dom{D) = {n I 2®° < n < 2®^ - 1}) 

R ::= {ro i->- wo, • • • , r^i W31} 

K {too e- >■ MetaFunco, . . .} 

{Dom{K) C (n I 2®i < n < 2®^ - 1}) 
t ::= add rsj,rs2,rd | movi w,rd \ mov rs,rd \ jmp fs 
I bit rsi,rs2,r’s3 | ldto[rs],rd | strs,to[rd] | illegal 
S ::= {C, D, R,pc,p, K) | user_error | kernel_error 



Fig. 1. Syntax of the abstract machine states 



value user represents that the abstract machine runs in the user mode and the 
value kernel represents that it runs in the kernel mode. The kernel K represents 
an operating system kernel. It is defined as a map from addresses represented 
as word values to meta functions {MetaFunc) that translate a machine state to 
another. The meta functions can be viewed as inner functions of the kernel that 
implement system calls. In a real machine, the meta functions are only sequences 
of instructions. In this paper, however, we do not care the real representation 
because they are regarded as a trusted computing base in practice and there are 
not significant points in the representation. In addition, there are two special 
machine states that represent an error state: user_error and kernel_error. 
user_error represents an error only for user programs that does not affect safety 
or integrity of the whole system. On the other hand, kernel_error represents a 
fatal error that may crash the whole system. 

Figure 2 defines the operational semantics of the abstract machine. They are 
defined as a conventional small-step function that translates a machine state S 
to another S' . If the program counter pc points into the domain of the code mem- 
ory C, then an instruction, C{pc), is executed as usual. The branch instruction 
jmp and bit can be viewed as special instructions for invocation of system calls 
if their target addresses are in the domain of the kernel. If the memory access 
instructions (id and st) access the illegal memory, that is, the code memory 
or the kernel, then a machine state evaluates to user .error (if p = user) or 
kernel_error (if p = kernel). If the program counter pc points into the domain 
of the kernel K, a system call is executed, that is. S' becomes a machine state 
that is obtained by applying S' to a meta function which is pointed by pc. 

2.2 Safety of the Traditional Protection-by-Hardware Approach 

The traditional protection-by-hardware approach can be expressed simply in the 
abstract machine. It only sets the privilege levels of the machine states to user 
to prevent kernel_error from occurring. 
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{C,D,R,pc,p,K) ^ S' 

If pc (fi Dom{K) U Domic) S' = error{p) 

If pc £ Dom jK) S' = K{pc)iS) 

If pc G Dom(C): 



if Cipc) = 


thenS' = 




add rsi,rs 2 ,rd 


iD, R',pc + 1) 


R' = R{rd !->■ Rirsi) + Rirs2)} 


movi w, Td 


(F, R',pc + 1) 


R' = R{rd !->■ w} 


mov Vs.Vd 


iD, R',pc + 1) 


R' = R{rd !->■ Rira)} 


jmp rs 


iD,R,Rirs)) 




bit r-si,r^^,rs3 


iD,R,Rirs,)) 


when F(raj) < R)rs 2 ) 


iD, R,pc + 1) 


when F(raj) > R)rs 2 ) 


Id w[rs],Vd 


iD, R',pc + 1) 


R' = R{rd !->• DiRirs) + w)} 
when Rira) -I- w G Dom)D) 


errorip) 


when R)ra) + w ^ Dom)D) 


St rs,w[rd] 


iD', R,pc + 1) 


D' ^ D{Rird) + we^ Rira)} 
when R)rd) -I- w G Dom)D) 


errorip) 


when R)rd) + w ^ Dom)D) 


illegal 


user_error 





where error{p) = kernel_error when p = kernel 
user_error when p = user 

C, p and K are omitted in the above table because they never change. 
Fig. 2. Operational semantics of the abstract machine 



Theorem 1 (Safety of the Traditional Protection- by- Hardware). If 

meta functions of K never alter privilege levels and never translate machine 
states to kernel _error, then any machine state of the form (C, D, R, pq user, K) 
never evaluates to kernel-error. 

Proof. Straightforward from the operational semantics of the abstract machine. 

□ 

2.3 Typed Assembly Language 

Now, we define a TAL that prevents kernel_error from occurring. Figure 3 
shows the syntax of the typed assembly language for the abstract machine. (It 
only shows the difference from the syntax of the abstract machine.) 

Our TAL has 6 kinds of basic types: a (type variable), int (word value), 
(ti,...,t„) (tuple), V[Z\].F, sizeofia), and sizeo/((ri, . . . , r„)). V[Z\].T is a 
type of instruction sequences. For example, if an instruction sequence I has 
a type V[Z\]T, then the register file of the abstract machine must have a register 
file type (explained below) represented by P, to jump to and execute the in- 
struction sequence. sizeof{{Ti, . . . , r„)) is a type of a word value that represents 
a size of a tuple of the type (ti, . . . , r„). For example, if a word value w has a 
type sizeofi{int,int,int)), then we know that w = 3. sizeofia) is a type of a 
word value that represents a size of a tuple of the type a. Our TAL also has 4 
kinds of types for the components of the abstract machine. 'Pc is a type of code 
memory. Pc is a type of data memory, F is a type of a register file, and Pk is 
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Fig. 3. Syntax of the typed assembly language for the abstract machine. Only the 
difference from the syntax of the abstract machine is shown 



Judgment 


Meaning 


A\- T 


r is a well-formed type 


b Pc 


Pc is a well-formed code memory type 


h Pd 


Pd is a well-formed data memory type 


^Pk 


Pk is a well-formed kernel type 


A\- r 


F is a well-formed register file type 


h {C, D) : P 


C is well-formed code memory of type Pc 




D is well-formed data memory of type Pd 


p^ R-.r 


7? is a well-formed register file of type F 


P, A\- w : T 


w is a well-formed word value of type r 


P\-T:r 


T is a well-formed tuple of type r 


p,A,r\- 1 


/ is a well-formed instruction sequence 


Pk\- K 


A is a well-formed kernel 


Pk\- S 


S is a well-formed machine state 



where 'P = {Pc, Pd, Pk) 



Fig. 4. Static judgments of the typed assembly language 



a type of a kernel and corresponds to the interface of system calls of the kernel. 
Although our TAL does not have rich types, we can extend it with them, such 
as existential types [12], recursive types [5], array types [10] and stack types [11], 
in theory. 

The dynamic semantics of our TAL are unchanged from the abstract machine. 
This indicates that we can erase type information of programs and no type- 
checking is required at runtime. 

Figure 4 presents static judgments of our TAL that assert the well-formedness 
of the components of the TAL. To have a we 11- formed machine state, the code 
memory, the data memory, the kernel, the register file and the instruction se- 
quence that starts from the address of the program counter must be well-formed. 
The static semantics for the well-formedness are presented in Appendix A. Be- 
cause of space limitations, the rules for word values, tuples and instructions are 
omitted. They are basically the same line of the rules of FTAL [5] etc. 

The well-formedness of the kernel is defined as follows. The basic idea is that 
system calls never break the well-formedness of machine states. 
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Definition 1 (The Well-formedness of the Kernel). The kernel K is well- 
formed, denoted as Tk h K, if: 

1. Tk is well-formed, that is, h Tk- 

2. The domain of K is equal to the domain ofTx- 

3. For all w in the domain of K, ifTx l~ S where S = {C, D, R,w,p, K), then 
K(w){S) ^ user^error or kernel^error and Fk 1“ K(w){S). 



2.4 Safety of the Typed Assembly Language 

Now, we show that, if a machine state S and a kernel K are well-formed, then 
S never evaluates to kernel_error. For that purpose, we first show that our 
typed assembly language satisfies the following usual Preservation and Progress 
lemmas. 

Lemma 1 (Preservation). If Tk b K, Tk b S and S i— S' , then Tk b S' . 

Lemma 2 (Progress). If <Fk b K and Fn b S, then there exists S' such that 
S^S'. 

Proof. By case analysis on C{pc) (if pc G Dom{C)) and K{pc) (if pc G Dom{K)), 
and induction on the typing derivations. Here, we only show the proof for the 
case that pc G Dom{K). 

If pc G K, there exists S' = K{pc){S) such that S !->■ S'. Thus, the progress 
lemma is satisfied. In addition, from the well-formedness of the kernel Fk b K, 
we have Tk b S' . Thus, the preservation lemma is also satisfied. □ 

From these two lemmas, we can prove that, if a user program can be type- 
checked, that is, if a machine state that represents the user program is well- 
formed, then the program never violates safety of the kernel from the fact that the 
machine state never evaluates to kernel_error, as long as the kernel is correctly 
implemented. Thus, we can safely replace the hardware protection facilities (the 
privilege levels) with the static type-checking of our TAL. 

Theorem 2 (Safety of the Typed Assembly Language). If Tk b K and 

Tk b S, then S never evaluates to kernel^error. 

Proof. Straightforward from Lemma 1 and Lemma 2. 

2.5 Dynamic Memory Allocation 

In our framework, a dynamic memory allocation mechanism can be realized by 
having the kernel include it as a system call, instead of having a special macro 
instruction as Morrisett’s TAL [12]. For example, we can use the following kernel. 

Example 1 (The Kernel with the Malloc System Call). 

'Em = { Wmaiioc ■ V[a].{ro : sizeof{a), ri : a, 

rsi ■ V[/?].{ro : a,n: a,r^i : /3}}} 

K = { Wmalloc MalloC } 

(Registers other than rg, ri and rgi are omitted here for the sake of brevity.) 
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The meta function M alloc allocates unused memory of the size specified by tq 
from the data memory D. Then, it initializes the allocated memory with the 
contents of the memory specified by ri and sets the register tq to the address 
of the allocated memory. Finally, it jumps to the return address specified in the 
register r^i- 



3 Kernel Mode Linux 

Based on the argument of the previous section, we implemented Kernel Mode 
Linux (KML), a modified Linux kernel for IA-32 CPUs which can execute user 
programs in the kernel mode. In this section, we describe the implementation of 
Kernel Mode Linux (KML). 



3.1 How to Execute User Processes in the Kernel Mode 

In IA-32 CPUs, the privilege level of a running program is determined by the 
privilege level of the code segment in which the program is executed. A program 
counter of IA-32 CPUs consists of the CS segment register, which specifies a code 
segment, and the EIP register, which specifies an offset into the code segment. 

To execute a user process in the kernel mode, the only thing KML does is to 
set the CS register of the process to the kernel code segment, the most-privileged 
segment, instead of the user code segment, the least-privileged segment. Then the 
process is executed in the kernel mode. We call such processes as “kernel- mode 
user processes” . 

Because of this simple approach of KML, a kernel-mode user process can 
be an ordinary user process (except for its privilege level). Therefore, even if a 
kernel-mode user process consumes huge amount of memory and/or enters an 
infinite loop, the kernel can reclaim the memory through its paging facility and 
suspend the process through its process-scheduling facility, because KML does 
not modify any code or data of the facilities. 



3.2 How to Invoke System Calls from Kernel-Mode User Processes 

To ensure safety of the kernel, user programs that will be executed in the kernel 
mode must be written in TALx86 [10], a notable typed assembly language imple- 
mentation for IA-32 CPUs, or Popcorn [10], a safe dialect of the C programming 
language which can be translated to TALx86. 

In the current KML, the interface of the system calls (that is, a kernel type 
mentioned in Sect. 2) is exported as TAL’s interface file. The interface file con- 
tains pairs of a name of a system call and its TAL type. The actual address of a 
system call is looked up from the name of the system call at link time, as usual. 
Then, programmers write their programs in TALx86 or Popcorn according to 
the interface file. Invocations of system calls are written as usual function calls, 
as expected. Though almost all system calls can be exported to user programs 
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safely, there are a few exceptions that cannot be exported directly because they 
may violate the well-formedness of the kernel (e.g., mmap/munmap system calls). 

Actually, KML must type-check user programs just before executing them for 
ensuring safety. However, the current KML itself does not perform type-checking, 
because the current TALx86 implementation cannot type-check executable bi- 
naries (though intermediate relocatable binary objects can be checked). Thus, 
the current KML needs to trust the external TALx86 assembler. We plan to 
develop our own TAL type-checker for checking the executable binaries to solve 
this problem. 

3.3 Executing Existing Applications in the Kernel Mode 

The current KML has a loophole mechanism to execute existing applications in 
the kernel mode and eliminate the overhead of system calls without modifying 
them. Though safety is not ensured, the mechanism is useful for measuring the 
performance improvement of applications due to the elimination of the overhead 
of system calls. 

To implement the loophole mechanism, we exploit the facility of the recent 
original Linux kernel for multiplexing a system call invocation into a traditional 
software interruption {int 0x80) or sysenter / sysexit instructions depending on 
a kind of CPU. We implemented a third branch for the multiplexer that in- 
vokes system calls with direct function calls. Because the multiplexer executes 
additional instructions to invoke system calls, a system call invocation with this 
mechanism is not optimal. However, it is sufficiently fast compared to the soft- 
ware interruptions and the sysenter / sysexit instructions. 

3.4 The Stack Starvation Problem 

As described in Section 3.1, the basic approach of KML is quite simple. How- 
ever, there is one problem we call stack starvation. In the original Linux kernel, 
interrupts are handled by interrupt handling routines specified in the Interrupt 
Descriptor Table (IDT). When an interrupt occurs, an IA-32 CPU stops ex- 
ecution of the running program, saves its execution context and executes the 
interrupt handling routine. 

How the IA-32 CPU saves the execution context of the running program 
at interrupts depends on the privilege level of the program. If the program is 
executed in the user mode, the IA-32 CPU automatically switches its memory 
stack to a kernel stack. Then, it saves the execution context (EIP, CS, EFLAGS, 
ESP and SS registers) to the kernel stack. On the other hand, if the program is 
executed in the kernel mode, the IA-32 CPU does not switch its memory stack 
and saves the context (EIP, CS and EFLAGS registers) to the memory stack of 
the running program. 

What happens if a user process that is executed in the kernel mode on KML 
accesses its memory stack, which is not mapped by page tables of a CPU? First, 
a page fault occurs, and the CPU tries to interrupt the user process and jump to 
a page fault handler specified in the IDT. However, the CPU cannot accomplish 
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this work, because there is no stack for saving the execution context. Because 
the process is executed in the kernel mode, the CPU never switches the memory 
stack to the kernel stack. To signal this fatal situation, the CPU tries to generate 
a special exception, a double fault. However, again, the CPU cannot generate 
the double fault because there is no stack for saving the execution context of 
the running process. Finally, the CPU gives up and resets itself. We call this 
problem stack starvation. 

To solve the stack starvation problem, KML exploits the task management 
facility of IA-32 CPUs. The IA-32 task management facility is provided to sup- 
port process management for kernels. Using this facility, a kernel can switch 
processes with only one instruction. However, today’s kernels do not use this 
facility because it is slower than software-only approaches. Thus the facility is 
almost forgotten by all. 

The strength of this task management facility is that it can be used to handle 
exceptions and interrupts. Tasks managed by an IA-32 CPU can be set to the 
IDT. If an interrupt occurs and a task is assigned to handle the interrupt, the 
CPU first saves the execution context of the interrupted program to a special 
memory region (called a task state segment, or TSS) of the running task, instead 
of to the memory stacks. Then, the CPU switches to the task specified in the 
IDT to handle the interrupt. The most important point is that there is no need 
to switch a memory stack if the task management facility is used to handle 
interrupts. That is, if we handle page fault exceptions with the facility, a user 
process executed in the kernel mode can access its memory stack safely. 

However, if we handle all page faults with the task management facility, 
the performance of the whole system degrades because the task-based interrupt 
handling is slower than the ordinary interrupt handling. Therefore, in KML, only 
double fault exceptions are handled with the task management facility. That is, 
only page faults caused by memory stack absence are handled by the facility. 
Thus, the performance degradation is very small and negligible because memory 
stacks rarely cause page faults. 

4 Benchmarks 

To measure the degree of performance improvement by executing user programs 
in the kernel mode, we conducted two benchmarks that compared performance 
of the original Linux kernel with that of KML. In each benchmark, we executed 
exactly the same benchmark program both on the original Linux kernel and KML 
with the system call multiplexing mechanism described in the previous section. 
We also compared KML with the sy s enter / sys exit mechanism. The experimental 
environment is shown in Table 1. 



4.1 First Benchmark: Latencies of System Calls 

The first benchmark measured latencies of 5 system calls by using LMbench [9] 
(version 2.0). getpid is the simplest system call that only obtains a process ID. 
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Table 1. Experimental environment 



CPU 


Pentium 4 3.000GHz (L2 cache 512KB) 


Memory 


1GB (PC3200 DDR SDRAM) 


Hard disk 


120GB 


OS 


Linux kernel 2.5.72 (KML_2.5.72_001) 



Table 2. Latencies of system calls. “Original” means the original Linux kernel and 
“sysenter” means the original Linux kernel using sy s enter / sys exit 





getpid 


read 


write 


stat 


f stat 


Original 


371.3 


439.3 


402.3 


1157.3 


608.7 


sysenter 


135.1 


201.1 


164.9 


896.5 


383.5 


KML 


16.9 


91.0 


53.4 


756.2 


204.1 



(Unit: nanoseconds) 



Therefore, the overhead of system calls becomes very large in it. read and write 
are basic I/O system calls. In the benchmark, read and write were performed 
on the null device (i.e., /dev/null). stat and f stat are system calls for obtaining 
file statistics. The result is presented in Table 2. 

The result shows that the getpid system call was about 22 times faster in 
KML than in the original Linux kernel. It also shows that it was about 8 times 
faster in KML than using sysenter / sys exit. The latencies of the other system 
calls were also improved in KML. 



4.2 Second Benchmark: Throughputs of File I/O Operations 

The second benchmark examined how performance of file I/O is improved in 
KML by using lOzone [13] (version 3.172). In the benchmark, we measured 
throughputs of 4 I/O operations in 4 tests: Write, Re-write, Read and Re-read. 
The Write test measures the throughput of writing a new file. The Re-write 
test measures the throughput of writing a file that already exists. The Read test 
measures the throughput of reading an existing file. The Re-read test measures 
the performance of reading a file that is recently read. In all of the 4 tests, 
we measured the throughput of the I/O operations on files whose size is from 
16 Kbytes to 512 Kbytes. We fixed the buffer size to 16 Kbytes. The benchmark 
was performed on a file system of Ext3fs. The result is shown in Figure 5. 

The result indicates that KML can improve various I/O operations for files 
of small size. The result shows that, compared to the original Linux kernel, the 
throughputs of Write, Re-write, Read and Re-read were improved up to 8 %, 
15.9 %, 25.9 % and 14.7 % respectively. Compared to the sysenter / sysexit mech- 
anism, the throughputs of Write, Re-write, Read and Re-read were improved 
up to 2 %, 13.3 %, 12.6 % and 15.3 % respectively. There is performance degra- 
dation in some cases (especially, in the Write test). This is mainly due to CPU 
cache effects. 
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Fig. 5. Throughputs of I/O operations. “Original” means the original Linux kernel and 
“sysenter” means the original Linux kernel using sy s enter / sys exit 



5 Related Work 

In the field of operating system and programming language research, there are 
several works related to safe execution of user programs in a kernel. An interest- 
ing difference between our research and the previous work lies in their objective. 
Our objective is to execute ordinary user programs in the kernel mode safely 
while the work below is concentrated on how to extend a kernel safely. 

5.1 SPIN Operating System 

SPIN [2] is an extensible kernel that ensures safety by a language-based pro- 
tection. In SPIN, kernel extensions are written in the Modula-3 programming 
language [6] . Safety of SPIN is ensured by the fact that Modula-3 is a type-safe 
language. That is, programmers cannot write malicious kernel extensions. 

This approach of SPIN has two problems. The first problem is that its trusted 
computing base (TCB) becomes large because the kernel must trust external 
compilers of Modula-3. In SPIN, the kernel cannot check safety of binary codes. 
In our approach, on the other hand, TCB is smaller than SPIN because safety is 
checked at the machine language level and we need not trust external compilers. 
The second problem is that the kernel extensions must be written in Modula-3. 
In our approach, we can write user programs in various programming languages, 
if there exist compilers that translate the languages to TALs. 
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5.2 Software-Based Fault Isolation 

Software-based Fault Isolation (SFI) [16] is a technique that modifies binary 
codes of applications to ensure memory and control flow safety. In the SFI ap- 
proach, check codes are inserted before each memory access and jump instruction 
of untrusted programs to ensure safety. 

The problem of the SFI approach is its large overhead of the inserted runtime 
safety check codes. In our approach, on the other hand, safety can be mostly 
ensured at load time through type-checking. 

5.3 Foundational Proof- Carrying Code 

In the Foundational Proof-Carrying Code (FPCC) [1] approach, a user program 
is attached a logical proof of its safety, and the proof is verified before executing 
the program. 

There are two advantages in the FPCC approach, compared to the simple 
TAL [12]. First, TCB becomes very small. The TCB of FPCC consists of a 
proof-checker, a machine specification that represents a behavior of a CPU and 
memory, and a safety policy. On the other hand, the TCB of the simple TAL 
system becomes larger because it includes a TAL type-checker. To solve this 
problem, we can take the approach of Hamid et al [5] . They showed that their 
TAL that is carefully defined so that well-formed TAL programs are mapped 
to valid machine states of FPCC can be syntactically translated to a FPCC 
program that does not violate memory safety and control flow safety. In their 
approach, the TCB can be as small as the FPCC approach. 

Second, a safety policy can be fiexible. In the FPCC approach, a safety policy 
is specified in logics of a proof-checker. Therefore, we can specify a safety policy 
that cannot be expressed in a simple TAL type system. For example, a limitation 
of memory or CPU usage can be ensured by the FPCC approach. However, there 
is a drawback of this flexibility of a safety policy: Proof generation may become 
very hard. In TALs, on the other hand, type-checking is simple and easy. In 
addition, to replace the hardware protection mechanisms, type safety suffices 
because they ensure only memory safety and control flow safety. 

6 Conclusion 

In this paper, we showed that the hardware protection mechanisms that tra- 
ditional operating systems exploit can be safely replaced with static program 
analysis, mainly type-checking. By discarding the hardware protection mecha- 
nisms, the overhead of switching a privilege level of a CPU can be eliminated 
and efficiency of applications can be improved. Based on this approach, we de- 
veloped KML, an operating system in which user programs can be executed in 
the kernel mode of a CPU (available at http:/ /www. taplas.org/~tosh/kml). The 
result of several benchmarks shows effectiveness of KML. 
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7 Future Work 

Although the current KML is effective for improving performance of programs, 
there should be a limitation because it only eliminates the overhead of system 
calls. To improve the performance further, we should modify the kernel and its 
interface and exploit the TAL type system more aggressively. For example, we 
think that a kernel can be modified to export network communication hardware 
to user programs as user-level communication technologies (e.g., [14,15,4]). The 
problem of the user-level communication is a tradeoff between performance and 
safety. To achieve high performance, the kernel must export network hardware 
to user programs directly and give up its safety because the user programs can 
access the kernel directly. To achieve safety, on the other hand, the kernel must 
encapsulate network hardware by system calls and give up high-performance 
communication. By using our approach, we can achieve both high-performance 
communication and safety because the overhead of system calls can be eliminated 
without losing safety. 

As other directions, our approach can be applied to microkernels. Traditional 
microkernels have a problem of the overhead of communication between a kernel 
and user servers. By applying our approach, the overhead can be reduced largely. 
In addition, we think that the large part of a kernel itself can be written in a 
strongly typed low-level language as TALs. Of course, it is difficult to ensure total 
safety of the kernel. We think, however, that the simple memory and control flow 
safety is still valuable. For example, consider synchronization primitives such 
as mutex locks and semaphores. It is difficult to ensure that these primitives 
are properly used for preventing deadlocks. However, it is very easy to ensure 
memory safety, because they only make a decent memory access. 
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A The Static Semantics of Our Typed Assembly 

Language (Rules for Types and Machine States Only) 



FTV{t) C a 
A h T 



(type) 



Vuii e Dom{I/c). Wi G [0, 2®° — 1] 
and Tc{wi) = V[A].F and ALT 
MFc 



(code memory type) 



VrCi e Dom{TK). Wi G [2®^, 2®^ — 1] 
and I/Kiwi) = V[A].F and ALT 



(kernel type) 



Vuii G Dom(TD). TupleRange{'I'D,w) C [2®°,2^^ — 1] 
and <TD{wi) = (ti, . . . , t„) and • h Toiwi) 
and Vwj G DomifTD) s.t. Wj / Wi. 
TupleRange(L'D,Wi) and TupleRangelTojWj) do not overlap 

h I'd 



(data memory type) 



Vn G Dom{r).A h r{ri) 
AFT 



(register file type) 



where 

TupleRange(fI'D,w) = [ic, w; -I- n — 1] where TDiw) = (ti, . . . ,r„) 



Fig. 6. The static semantics of the typed assembly language: types 
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h (C,D) :<F 'Ph R-.r 
(1) If pc gC, P, ■, r \- InstrDec{C,pc) 
or 

(2) If pc G K, Pk{pc) = \/[A'].r' • h Ti [ti, . . . , Tn/A']r' = r ^ 

(state) 

(C,D,R,pc,p,K) 



h Pc L Pd 

'^Wi G Dom{Pc)-P, Ai,Fi h InstrDec{C,Wi) where Pc{wi) = V[/ii].ri 
Vwi G Dom{pD)-P h TupleDec{D, Wi, n) : Pniwi) where Poiwi) = (n, . . . , t„) 

h (C, D) : 

(memory) 



Vci G Dom{r).P, ■, ■ h i?(ri) : r(ri) 
P\- R:R 



(register file) 



where 

InstrDec{C, w) = iw, ■ ■ ■ , t230-i where ti = C{i) 

TupleDec{D, w, n) = {wo, ■ ■ ■ , w„-i) where Wi = D(w + i) 

Fig. 7. The static semantics of the typed assembly language: state, memory and register 
file 
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Abstract. Various network services are becoming more and more im- 
portant serving roles in the social infrastructure in the Internet. The 
more clients a service has, the more load the server has, which obvi- 
ously lowers the quality of service. To balance load and maintain quality, 
multiple servers (mirror servers) are commonly provided that are able to 
provide access to the same service. However, it is difficult to estimate the 
required number of mirror servers in advance. To overcome this, we pro- 
pose a new concept called self-configurable server groups and discuss its 
basic mechanism where all server nodes communicate with one another 
in a peer-to-peer style to adjust to the expected number of servers. We 
also demonstrate the effectiveness of the proposed mechanism via some 
simulations. 



1 Introduction 

The Internet is becoming more and more important as a basic infrastructure for 
many organizations and companies providing various kinds of services other than 
electronic mail or Web pages. Since server and network resources are limited, 
increasing requests from clients may inflict excess loads on servers or occupy 
excess network bandwidth, creating serious problems such as delays or cessation 
(in the worst case) of services. One approach to avoid these is to prepare mirror 
servers that offer, as the name suggests, exactly the same services. Mirror servers 
are quite helpful in balancing both the service request load and occupation of 
network bandwidth. 

However, this approach has the following problems from the administrative 
point of view. First, it is difficult to estimate a reasonable number of mirror 
servers. If too many servers are set up against the expected peak for requests, 
parts (and possibly most) of their resources will not be used most of the time 
except for the peak period. Second, it is difficult to predict the peak time and area 
for the service requests. If mirror servers are placed at inappropriate locations, 
their effect is negated because of communication delays or network bottlenecks. 

These problems result from the fact that the demands for services are ever- 
changing[l], i.e., they fluctuate every moment both temporally and spatially. A 
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naive way of coping with these fluctuations is to keep a constant watch over the 
service requests from the clients and incrementally adjust (increase or decrease) 
the mirror servers manually at appropriate times and locations. However, this 
needs experienced administrators who are capable of making suitable decisions 
about these adjustment based on accurate estimates. In addition, setting up or 
stopping mirror servers is very time-consuming. Thus, this naive approach is 
difficult to implement in practice. 

This paper proposes a new system called self- configurable server groups to 
solve these problems. In this system, all server nodes communicate with one 
another in a peer-to-peer (P2P) style to automatically adjust the number of 
mirror servers according to the fluctuations in demand for services. Administra- 
tors do not need to statically determine the number or location of mirror servers 
in advance, or dynamically (or manually) set up or stop the mirror servers ac- 
cording to fluctuations in service requests. The targets of the self-configurable 
server groups are the services with static contents such as static Web pages, FTP 
service and so on. 

The proposed system has the following features. 

— It clearly distinguishes the policy of adjusting mirror servers from the mech- 
anism that implements the adjustment. The policy determines when and 
where mirror servers will be increased (or decreased), and the mechanism 
determines how mirror servers will be increased (or decreased). This de- 
sign principle enables administrators to set their own policies to control the 
adjustment behavior. 

— Because the system has no global states such as the total number of mir- 
ror servers or the total number of requests from clients, it thus needs no 
centralized servers. Although each server acts according to a given policy 
to only use local information such as the number of requests to individual 
servers, the whole system can attain an almost optimal number of servers. 
This design enables the services based on the system to be scalable. 

— The system can be used by servers that offer multiple services at the same 
time. These services are able to share the same server resources, (possibly) 
with some priority policies. Thus, the resources are appropriately and flexibly 
assigned to high-priority services or frequent requests. 

This paper is organized as follows. Section 2 discusses basic structure and mech- 
anism of self-configurable server groups through some concrete examples, and 
demonstrates how the proposed system is able to adapt the configuration of 
mirror servers to fluctuations in service requests. Section 3 describes details on 
policies that can be customized by the designer or the service administrator. 
Section 4 provides details on the design of the system. To demonstrate just how 
effective the proposed system is, by setting the appropriate policies, we provide 
some simulation results under different policies and reveal that it can configure 
mirror servers as expected in Section 5. Section 6 describes related work and 
Section 7 concludes the paper. 
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Fig. 1. Dynamically adjusting number of mirror servers. 



2 Self-configurable Server Group 

2.1 Basic Behavior 

The goal of a self- configurable server group is to make it easier to adjust the 
number of mirror servers and by doing this we can constrain the maximum ser- 
vice delay under some constant. A self-configurable server group consists of a 
large number of server nodes, each of which donates its computing resources to 
the group. A server node can provide multiple services at the same time and thus 
enables the group to flexibly apportion the computing resources among these ser- 
vices. A self-configurable server group responds to dynamic fluctuations in ser- 
vice demands, i.e., it increases/decreases the number of mirror servers, depending 
on the increased/decreased demands for a specific service. The server nodes com- 
municate with one another in a peer-to-peer fashion (i.e., no centralized nodes) 
to adjust the number of mirror servers. Each server node autonomously starts 
and stops a certain service. 

Figure 1 illustrates a self-configurable server group that consists of 10 server 
nodes, and provides two services (A and B). Demand for service A increases 
in Fig. 1 (a). In response to this increase, several server nodes start to run 
the mirror servers for service A (Fig. 1 (b)). After that, demand for service A 
decreases (Fig. 1 (c)). Therefore, as we can see from Fig. 1 (d), some server nodes 
spontaneously stop service A. Likewise, the server node starts to run service B’s 
mirror server, when demand for it increases (Figs.l (e) and (f)). 

A self-configurable server group promotes effective utilization of shared re- 
sources in the group because it allows resource sharing among several services. 
A single server node may provide more than one service at the same time. We 
can see a server node that provides both services A and B in Figs. 1 (b),(c) 
and (f). 
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Because the server nodes in a self-configurable server group are distributed 
all over the Internet, it is quite difficult to share global states such as the load in- 
formation for every server node. If we establish a centralized server that provides 
such global information, this obviously becomes a hot spot as well as a single 
failure point. To avoid this, we employ a peer-to-peer model for execution; i.e., 
all server nodes are equivalent in the group. Each server node communicate with 
some “neighbor” nodes, and autonomously determine whether to start or stop 
a specific service. We proved that satisfactory performance could be obtained 
even though global information was not shared. 



2.2 Mechanism 

In what follows, we will briefly describe the protocol used to dynamically adjust 
the number of mirror servers. The protocol consists of two stages: (1) a service- 
starting phase and (2) a service-stopping phase. 

Service-Starting Phase: 

A server node acquires a service package, in the service-starting phase, registers 
its intention to provide that service, and starts the service if the demand for 
the service increases. A service package consists of all files necessary to provide 
the service. For example, a service package for the web includes an Apache web 
server, HTML files, and the server’s configuration files. 

Figure 2 illustrates how a self-configurable server group adjusts the number 
of mirror servers. Here, the group consists of 10 (A to J) server nodes. As can 
be seen in Fig. 2 (a), two server nodes A and B are providing a service. These 
server nodes are called running nodes. The other nodes (C to J), called idle 
nodes, do not provide any services. Server node A in Fig. 2 (a), is a well-known 
node, which is the original source for the service provided. 

(1) Acquisition of Service Package: First of all, an idle node accesses a well- 
known node to obtain node information (e.g. IP address and so on) on running 
nodes. If the idle node does not have a service package, it requests one of the 
running nodes to send one. In Fig. 2 (b), idle nodes D and E request the package 
from A, while nodes H and I request these from B. Well-known node A informs 
these idle nodes that B is also running the service. The running node sends the 
package to the idle node after receiving the request. Note that the running node 
can delay sending the package, e.g., until the load is reduced. 

(2) Registration: After receiving a service package, an idle node announces 
its intention of helping a running node by registering itself as a helper node. A 
helper node is a candidate for the mirror server, and waits for a request from 
the running node to start the service. Idle nodes D and E in Fig. 2 (c) register 
themselves on running node A, and wait for A to request them to start the 
service. 

(3) Starting Service: To request a helper node to start a service, a running 
node monitors its own CPU load and the number of accesses from clients. If the 
running node determines that this is too heavy to manage all accesses from the 
client, it requests the helper nodes to start the service. As running nodes A and B 
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Fig. 2. Basic operation of self-configurable server groups. 



in Fig. 2 (d) have become heavily loaded, this results in increased demand for the 
service. Thus, running node A requests helper node E to start the service. In the 
same way, helper nodes H and I start the service at B’s request. Consequently, 
there are five running nodes (A, B, E, H, I) in Fig. 2 (e). The load can be shared 
between these running nodes because there are more mirror servers. 

Service-Stopping Phase: 

A running node, other than a well-known node, can cease providing a service 
independently of the other server nodes. It typically stops the service when 
demand for the service decreases significantly. Because a running node monitors 
its own CPU load and the number of accesses from clients, it can decide when 
to stop the service. To prevent all server nodes from stopping a specific service, 
a well-known node must continue services even if no accesses from clients are 
attempted. Server nodes H and I in Fig. 2 (f) cease the service in response to 
the decreased demand. Section 3 discusses the policy used to determine when to 
stop a service in more detail. To restart the service, the server node that stopped 
the service must register itself again and be requested to start the service. The 
protocol used to implement a self-configurable server group is listed in Tables 1. 

3 Customizable Policies 

The mechanism described in Section 2.2 is controlled by a set of policies that 
determines when and where to start or stop mirror servers. The system designer 
or administrator can define tailor-made policies to meet various service user 
needs. For example, we can discriminate between the quality of two services; one 
is prioritized higher so that its overall delay has to be reduced more than the 
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Table 1. List of messages. 



(Explanatory Note) 
Message name [argument 
Description of this message 



INFO \service-name !C* 


This message is used to obtain information on the service specified by service-name. 
The server node that received this message sends back the name and file size of the 
service package, and the IP addresses of the mirror servers. If the receiver does not 
know the specified service, an error message is returned. If the letter is specified 

as a service-name, the receiver returns information on all services it knows. 


REGISTER \service-name 


This message is used by an idle node to announce its intention of helping a running 
node. The running node that receives this message registers the sending idle node as 
a helper node, if it is running the service specified by service-name. Otherwise, an 
error message is returned. 


UNREGISTER \service-name 


A helper node uses this message to cancel its intention of helping a running node. 
The service-name specifies the service that the helper node was going to help. If 
the sending node is not registered as the helper node of the service-name, an error 
message is returned. 


GET \service-name 


This message is used to obtain a service-name service package. If the received node 
has this, it sends it at the time determined by the service-acquisition policy. 


START \service-name 


This message is sent by a running node to request a helper node to start the ser- 
vice specified by service-name. After receiving this message, a helper node starts the 
service and becomes a running node of the service. 


STOP \service-name 


In case of emergencies, e.g., when a critical security hole is found, this message is 
used to stop the service specified by service-name. 


ADVERTISE \service-name 


When a helper node starts the service specified by service-name, this message is sent 
to some “neighbor” nodes (obtained by INFO message) to advertise the fact that the 
helper node is now running the service. 


UN ADVERTISE 1 service-name 


When a running node stops the service specified by service-name, this message is sent 
to all the helper nodes registered on the running node to inform them that they can 
no longer help this running node. 



other. In a self-configurable server group, we can define the following policies. 
Note that no policy can utilize global information because our mechanism does 
not allow global states to be shared. Rather, each policy must be designed to 
only utilize local information — such as the CPU load of the node on which the 
policy is evaluated. 
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Policies Used by Idle Nodes 

— Policy for Requesting Service Package: This policy is used to select 
the node from which a service package is obtained. Before starting a service, 
an idle node obtains a list of the running nodes for the service, evaluates the 
policy, and selects the node from which it will request a service package. For 
example, we can select a node based proximity within the network. 

— Policy for Registration: This policy determines to which and how many 
running nodes the intention to provide a service should be sent. For example, 
an idle node can register itself as a helper all over the network. Conversely, 
it can be registered on just a few neighbor nodes. 

Policies Used by Running Nodes 

— Policy for Distributing a Service Package: This policy determines 

when a service package is to be distributed. As mentioned earlier, a running 
node need not send a service package immediately it is requested to. For 
example, a running node may send the package when its own load is suffi- 
ciently low not to affect the performance of the service. Conversely, it can 
send the package immediately on request so that the helper node is ready as 
soon as possible. 

— Policy for Starting Helper Nodes: This policy determines 1) when to 
request a helper node to start a service, and 2) which and how many helper 
nodes are requested. For example, a running node requests helper nodes 
to start a service when its own CPU load exceeds a pre-defined threshold. 
The running node may select the helper nodes from the “busy” area from 
which many clients are accessing to the node so that the mirror servers 
can be effectively deployed. In another example, if the peak time has been 
statistically established, a running node can request helper nodes to start a 
service before this peak occurs. 

— Policy for Stopping a Service: This policy determines when to stop a 
service. Typically, a running node ceases a service when the number of access 
attempts from clients drops to a pre-defined threshold. 

4 Design of Self-configurable Server Groups 

This section provides details on the design of the implementation for the pro- 
posed system. Section 4.1 describes the implementation of the mechanism while 
Section 4.2 describes customizable policies. 

4.1 Modules 

The mechanism for self-configurable server groups is composed of three mod- 
ules: the protocol processing module, the monitoring module and the dispatcher 
module (Fig. 3). These modules form the daemon process for the proposed mech- 
anism which is on each server node. The details on each module are as follows. 
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Fig. 3. Basic design of self-configurable server groups. 

— Protocol Processing Module: This module has three main roles. First, 
it communicates with daemon processes on other server nodes by exchanging 
messages that obey the protocol defined by the proposed system. Whether 
or not to send INFO, REGISTER and START messages and the destination 
nodes of these messages are determined based on the given policies. Second, 
this module manages service packages that have been received from other 
mirror servers. Third, this module starts the service by creating a new server 
process in response to requests from other mirror servers. After creating a 
server process, it informs the monitoring module of various information (e.g., 
process ID) on the server process. 

— Monitoring Module: This module continuously collects the statistical 

information of the node by monitoring the CPU load and the number of 
service requests for each service and so on. The protocol processing module 
refers this information to determine the current state of the node. 

— Dispatcher Module: This module forwards each network connection from 
a client to the corresponding server process on the node. This forwarding 
makes it easy to update the service to a new version, while its old version is 
still running. 



4.2 Description of a Policy 

The system we propose is implemented in Java, and its policies are supplied by a 
class of Java. To customize the policies for a service, programmers have to extend, 
i.e., make a subclass of, the predefined abstract class Policy. There is a simple 
example in Fig. 4 that shows a definition of a subclass named SamplePolicy. 
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class SamplePolicy extends Policy { 

public void sendService 0 { 

MonitorModule a ■ Module. getKonitoringModule () ; 

Secviceinfo s » m.getServiceIn£o(serv_id) ; 
if (getNodeHaxLoad 0 <s n. getNodeLoadU || 

getConnectionC^perLimitHumO <s s .getConnectionNumO } { 

int start nusk s (int)((s.getConnectionNum{)/getConnectioni:^erLijaitNuraO-i- 

8 .getConnectionNumO /getConnectionLowerLimltNumO -1} 12 ) ; 
Iterator i s s.getHelperNodeSetO .entrySetO .iterator () ; 
while (start nun » 0 && l.hasNextO){ 

8 .getStart^eue O . add (i .next 0 ) ; 
startnum — ; 

> 

} 

) 

public boolean startService (} { 

MonitorModule m a Module. getHonitoringModule () ; 
if (m.getNodeLoadO < getKodeMaxLoadO ) 
return true; 
return false; 

} 

public boolean stopService () { 

MonitorModule m a Hodule.getMonitoringHoduleO ; 

Serviceinfo s » m.getServiceInfoiservid) ; 
if (8 .getConnec tlonNum 0 < getConnectionLowerLimltNumO) 
return true; 
return false; 

} 

) 



Fig. 4. SamplePolicy class. 



Each policy that can be customized corresponds to an instance method in 
Policy, and designers and administrators who want to customize these have to 
define a new subclass of Policy and override the instance method by a method 
defined in the subclass. For example, the method startService () decides the 
policy for starting helper nodes, and the method stopService () decides the 
policy for stopping a service. 

Instance methods that describe policies are able to refer to the statistical 
information collected by the monitoring module using methods in the Monitor 
Module class. For example, in Fig. 4, startService () method calls getNode 
LoadO method to obtain the current CPU load of the node, and stopService () 
method uses getConnectionNumO method to obtain the number of connections 
for the service. 

As we can see from Fig. 3, policies defined in a subclass of Policy are referred 
by the protocol processing module. 

5 Simulations 

This section discusses the results we obtained from simulating the basic mecha- 
nism of self-configurable server groups. The purpose of these simulations was to 
insure that the proposed system was really effective in adjusting the number of 
mirror servers autonomously according to given policies. 
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5.1 Simulation Environment 

Physical Network and Nodes: The physical network is a mesh topology 

with a size of 10 x 10 nodes. The network latency NL between two nodes in the 
mesh is proportional to their Manhattan distance D and the number of packets 
p. Thus, 

NL = k-p- s- D, (1) 

where k is a, constant. 

We define the packet size s as 1.5 KB because the MTU size of the Ethernet 
is 1.5 KB. A message of the self-configurable server groups protocol, a request 
from a client, and a reply message from a server node consists of one packet. 
Each service package is the same size as 500 packets (750 KB). We set the 
proportionality constant of network latency at /c = 4, which means that the 
communication speed between the nearest two nodes is 2 Mbps and that of the 
farthest two nodes is 0.111 Mbps. All server nodes were assumed to have the 
same computing power. 

Clients and Mirror Servers: The locations of clients were randomly selected 
from nodes in the physical network. Each client also randomly selected a mirror 
server from running ones, to which the client had sent a service request. 

Service Processing and Latency Times: We assumed that service pro- 

cessing time STi for service t by a mirror server depended on total load I of 
the server; when the load was small (i.e., less than some threshold L>>ound^^ 
took constant time ci. When the load becomes heavier, STi increases because of 
the race to access the resources. Thus, service processing time STi is defined as 
follows: 

f „ 1 T bound 

J L ^ 1j /gy\ 

\c, + k',{l- otherwise ^ ’ 

where k[ is the proportionality constant of service processing time. Since all 
server nodes have the same power, STi is independent of each node. 

For a client, service latency SLi for service i is the sum of twice the network 
latency NL and service processing time STi on the server. 



SL, = 2- NL+ ST, (3) 

We assumed that Ci, k'i and STi (and thus SLi) were independent of i, so we 
omit the subscript and simply describe them as c, k' , ST and SL. We set the 
constant of service processing time c to 5 milliseconds and the proportionality 
constant of service processing time k' to 2.0. 

5.2 Policies 

To determine the policies for simulations, we adopted the following conditions 
concerning the load of each mirror server. 

— The total load I of all services must not be larger than the maximum limit 

j^max 
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— The load of the i-th service k has to be between the lower limit and 

the upper limit 

The former is a condition that keeps the service time within an acceptable 
latency. The latter means that a node’s load for service i must be within an 
appropriate range, i.e., with not too few or too many access attempts. 

These conditions can be expressed as: 

/ ^ ^ 

flower < ;. < l^PPer (4) 



Note that I = ^ ^ k . Additionally, and have to satisfy the following 

equations. 



< 



^ ^ j^lower 



L 

jlower ^ jupper 






( 5 ) 



We defined the policies we used in simulations based on the above conditions. 



Policy for Starting Helper Nodes: If the load of a node increases so that 
it does not satisfy condition (4), i.e., < I or the node requests 

some of the helper nodes to start the service. Customizable policy is used here 
to decide the appropriate number of helper nodes to which requests are sent. 

If a server issues a* requests to helper nodes, the number of server nodes 
around it becomes 1 + a^, so the server load is expected to be k/{l + ai). Thus, 
Qi has to be determined to satisfy Notice that ai can 

be set to any value that satisfies this inequality. We selected at so that k/{l + ai) 
was close to 

Policy for Stopping Service: If the load of service i of a node become 

h < the server stops service, except where the server is well-known for the 
service. 



Other Policies: We did not use information on the distance between nodes for 
policies, because we mostly found that self-configurable server groups have the 
ability to resolve temporal fluctuations. The following discusses the customizable 
policies in Section 3. 

The package request policy is where a node randomly sends requests to some 
of the currently running mirror servers. The registration policy is where a node 
randomly send registration messages to mirror server nodes. The package dis- 
tribution policy when a node receives a package request, it immediately replies 
with a service package. 



5.3 Results 

We assume that one time unit in the simulations corresponded to 1 millisecond. 
Each simulation was executed in 180 million time units (30 minutes). The total 
number of service requests from clients periodically fluctuated between 0 and 
3000. As there were 100 or fewer server nodes, the number of accesses per node 
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was expected to be less than 100. We thus set ^ the maximum load for 
each server node, to 100, which was a larger value than the average number of 
accesses. We also set to 35. We conducted two types of simulations where 

each measured the latency in a service. 

In simulation (1), we established the same priority for both services A and B 
by giving them the same policy _ 0 q jim)er _ flower _ 

The purpose of this was to insure that the system we propose can restrict service 
latency within an appropriate range even if client accesses increase. 

The results for this simulation are in Fig. 5 (1). Service B starts after 56,250 
time units have elapsed. This clearly shows that this system can quickly adapt 
to dynamic fluctuations in access from clients. 

When the average number of accesses to each node comes close to upper 
limit running nodes increase and the load is shared among 

many mirror servers. In the meanwhile, when the average number of accesses to 
each node comes close to lower limit 15) ^ some of the running nodes 

spontaneously stop the service to conserve resources. 

This also means that the increased demand on one service does not seriously 
affect the other service’s latency. The average time of latency is approximately 
between 100 and 180 milliseconds for both services. This demonstrates that the 
new system can effectively provide many services. 

We applied different policies for services A and B. Since 45) -^^s 

set lower than 75)^ service B had a higher priority than service A and 

thus the mirror servers for B were expected to increase more rapidly (Fig. 5 (2)). 
Compared to the results for simulation (1), the latency for service B with a higher 
priority is shorter by 9.0%, sacrificing the increase in latency for service A. This 
is also supported by the number of running nodes (Fig. 5), where service B has 
more than service A. Therefore, we can control the QoS by applying different 
policies to different services. 

6 Related Work 

Bernardo et al. [2] proposed a dynamic content distribution system in response 
to changes in the Internet environment. Its main purpose was to automatically 
achieve scalable services. It deployed server agents to use location servers to 
provide lookup services and manage server agent information. Their mechanism 
was different from the one we proposed in this paper in that it needed location 
servers to be hierarchized while our mechanism allows nodes to communicate 
with one another in a peer-to-peer manner without hierarchy. 

Hadas [3] and FarGo [4,5,6] can effectively adapt a network service by de- 
ploying application components and their layouts of components in response to 
changes in a node’s load and the effective bandwidth between components. Since 
their aim was to automatically layout components, they could not autonomously 
adjust the number of mirror servers according to fluctuations in service requests. 

Akamai’s [7] purpose was to decrease the download time from web servers. 
A client is able to download cached contents automatically from the closest 
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Fig. 5. Results of simulations. 



Akamai server, which can reduce latency time and effectively conserve network 
bandwidth. This mechanism is the automatic distribution of service requests to 
the server nearest the client without adjusting the number of server nodes. 

There are many related works [8,9,10,11,12] that are designed for data cen- 
ters, hosting centers or clusters, but do not target an Internet-wide collection of 



servers. 
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HotRod [8] can adjust the number of servers on the basis of long-term or 
short-term predictions. This system resembles ours in that it can dynamically 
change the number of servers. However, the exploited mechanisms are totally 
different. Each node in our mechanism autonomously behaves in response to 
local information such as CPU load, while HotRod changes the number of servers 
according to a plan established by monitoring all server nodes in a cluster. 

The N1 Provisioning Server [9] is a management system for many server 
machines inside a data center, and the Oceano project [10] is a scalable infras- 
tructure that are easy to administer for utility computing. Like our system, the 
server program is automatically distributed in these systems. The purpose of 
this automatic distribution is to configure a recently added machine in Nl, or to 
control the number of server nodes in Oceano, but they do not target the server 
machines all over the Internet. 

Muse [11] is an operating system for hosting centers that can manage energy 
and server resources. Pinheiro et al. [12] also focus on the resource and energy 
used by a cluster. They dynamically control the number of nodes offering ser- 
vices based on load, but unlike our system, they are mainly interested in energy 
management. 



7 Conclusion 

This paper proposed a new system called self-configurable server groups as the 
basis for network servers to autonomously adjust the numbers and locations of 
mirror servers according to fluctuations in service demand. Since the system 
allows individual policies to be decided to cope with these fluctuations, the 
administrators are able customize the way they offer services. Simulation results 
revealed that the system we propose is very effective in automatically adjusting 
the number of mirror servers according to two kinds of policies. 
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Abstract. We provide a formal definition of information flows in XML trans- 
formations and, more generally, in the presence of type driven computations and 
describe a sound technique to detect transformations that may leak private or con- 
fidential information. We also outline a general framework to check middleware- 
located information flows. 



1 Introduction 

XML is becoming the de facto standard document format for data on the Web. Its dif- 
fusion however is characterized by two correlated paradoxes: 

1 . Despite the increasing success of XML for exchanging and manipulating informa- 
tion on the Web, little attention has been paid to characterize and analyze informa- 
tion flows of XML transformations and, specifically, their security implications. 

2. As shown by the standardization process, XML documents are intrinsically typed 
(cf. the notions of well-formedness and validity). Nevertheless, the “standard” pro- 
gramming languages used to manipulate them are essentially untyped, in the sense 
that even when they are equipped with a type system the latter does not use the type 
information of XML documents. 

If we consider the self-descriptive nature of XML documents, these paradoxes are less 
surprising than it may seem: XML documents use tags to delimit some content, and 
these tags can be considered as type information about the content they delimit. There- 
fore, XML documents — even those that do not contain a DTD — are in some sense “self- 
typed” constructions and this makes the definition of a type system for XML transform- 
ers difficult. As long as a type system often constitutes a first basic step toward the defi- 
nition of security analyses of transformations, this may partially explain the absence of 
formal tools to characterize insecure information flows. 

Goal. The aim of this article is to define and characterize information flows in XML 
transformations in order to single out potentially insecure transformations, that is trans- 
formations that may leak confidential or private information. To that end we study trans- 
formations defined in a typed programming language for XML documents. There are 
several candidates for such a language, since several attempts have been made in the 
literature to overcome the second of the XML paradoxes (e.g., HaXML [21], JWIG [3], 
Xtatic [1 1], XDuce [13], XQuery [8], and YATL [4]). 

In this work we characterize and analyze information flows for transformations de- 
fined in CDuce [2]. There are several reasons to choose CDuce as target language for 
our study. First and foremost, unlike other XML-oriented languages, CDuce is general 

V.A. Saraswat (Ed.): ASIAN 2003, LNCS 2896, pp. 33-53, 2003. 

© Springer- Verlag Berlin Heidelberg 2003 




34 



Veronique Benzaken, Marwan Burelle, and Giuseppe Castagna 



purpose, in that it provides besides XML types several other datatypes, enabling to pro- 
gram general (even XML unrelated) applications. Second, among the known languages 
for XML, it possesses the richest type algebra. Finally, its semantic and set theoretic 
foundations make it a good candidate for defining or hosting a declarative query lan- 
guage (see [2]) and, as such, it nicely fits our scenario of global queries on the Web. 

Problems. We said that the difficulty of defining type systems for XML transforma- 
tions resides in the self-typing nature of the documents. More precisely, this self-typing 
characteristic induces “type based” (or “type driven”) computations: matching on doc- 
ument tags roughly corresponds to matching documents’ types; similarly, producing el- 
ements with different tags corresponds to outputting results of different types. In some 
sense, typed XML transformations are akin to the application of typecase constructions 
where the different cases may return differently typed results (this is accounted for in 
CDuce by the use of dynamically bound overloaded functions). The presence of type 
driven computations makes the task of capturing information flows much harder and 
constitutes the main novelty and challenge of this study. In fact we cannot resort to 
classical data flow analyses since they are usually applied to computational frameworks 
whose dynamic semantics does not strictly depend on run-time types. A second chal- 
lenge is that information flows (more precisely, their absence) are usually characterized 
in terms of the so-called non-interference property [12]. Our study demonstrates that 
in the presence of type driven computations this notion must be modified so as to in- 
clude static type knowledge, otherwise we end up with very trivial analyses. Finally, 
the last challenge is to define flow analysis for a pattern-matching based language — as 
CDuce is — since this stands at least two obstacles: (i) pattern matching is a dynamic 
type-case, therefore we have to propagate type information in the subsequent matches; 
{a) the use of a matching policy (first-match as in CDuce or best match as in XSLT) 
induces dependencies among the different components of an alternative pattern as well 
as among different cases of a pattern-matching expression and this must be taken into 
account when characterizing information flows (the sole fact of knowing that a pattern 
did not match may produce a flow of information). 

Contributions. The contributions of this article are essentially three: 

1 . it provides a formal definition and study of information flows in the context of XML 
transformations and, more generally, in the presence of type driven computations; 

2. it describes a sound technique to detect XML document transformations that cause 
insecure information flows, and formally proves its correctness; 

3. by defining security annotations and by relating various kind of analyses (static/dy- 
namic, sound/complete) to different query scenarios, it proposes a general frame- 
work for checking security of middleware-located information flows. 

Example. The development of our presentation can be illustrated by an example. Con- 
sider the following XML document which stores names and salaries of the workers of 
a fictive company. We imagine that while generic users are allowed to perform queries 
on this document, the information about salaries must only be accessible to authorized 
users. Therefore we need a way to detect queries that may reveal information about 
salaries, in order to reject them when they are performed by unauthorized users. A 
first naive technique to obtain it would be to mark the salary elements and dynami- 
cally reject all queries that contain marks in their result. Unfortunately, this approach 
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is clearly inadequate since the information about salaries can be deduced as follows: 
perform a query that returns the list of all workers whose salary is greater than n and 
then iterate the query by varying n until we obtain as many different results as workers. 

A more effective solution is to reject all the queries 
whose result accesses the value of the salary elements. 
For example consider the following two queries: 

[Ql] Get the list of all workers 
[Q2] Get the list of all workers whose salary is greater 
than € 1600 

The first query can be always safely executed while the 
second one must be forbidden to unauthorized users. 
This can be obtained by enforcing an access control 
policy. For instance this is done in [7, 6] by executing 
a query on a view (in the database sense) obtained by 
pruning from the XML documents all data the owner 
of the query has not the right to access. 

While enforcing access control is enough for simple policies like the above, it soon 
becomes inadequate with slightly more complicated policies. For instance imagine that 
instead of forbidding access to salaries we want to allow queries owned by generic users 
to access salaries (e.g. for statistical purposes) but in a way that prevents these queries 
from associating a specific worker with her/his salary. This corresponds to rejecting 
all queries whose result depends both on the value of salary elements and on that of 
name or surname elements (but queries like Q 1 or a query that returns all salaries are 
acceptable). To enforce this constraint we have to switch from an access analysis to a 
dependency (or information flow) analysis'. 

Causal security policies, such as above, can be formalized by the notion of non- 
interference, that can be restated for XML documents as follows: a set of elements 
does not interfere with the result of a given query if for all possible contents of the 
elements the query always returns the same result. In our example, consider the set of 
all documents obtained from our XML document by replacing the content of salary 
elements by arbitrary numeric values. Query Ql is interference-free since when it is 
applied to all these documents it always returns the same result. Query Q2 instead is 
not interference-free since its results may differ. 

A precise definition of non-interference constitutes the first step of our approach 
since it dehnes the set of queries that are safe. The following step is to devise one or 
more techniques to determine the safety/unsafety of queries. To that end we first classify 
components that store confidential information by annotating data elements by labels 
of the form The ^ intuitively represents a security classification of the information 
stored in the element (e.g., public or private, but it could be any label from a possibly 
unordered set) while f is a type that describes the static information publicly available 
about the data’s content (e.g. for salaries it records that the element stores an integer 

' While for this specific example it is still possible to resort to access control techniques (execute 
the query on two different views obtained by stripping in one all salaries and in the other all 
names and surnames) these sole techniques soon become insufficient, as shown by the example 
in Section 6. 



<?xml version="1.0"?> 

<company> 

<worker> 

<surname>Durand</surname> 

<name>Pauk/name> 

<salary>6500</salary> 

</worker> 

<worker> 

<surname>Dupond</surname> 
<name>Jean</name> 
<salary>1 800</salary> 
</worker> 

<worker> 

<surname>Martin</surname> 
<name>Jules</name> 
<salary>1 200</salary> 
</worker> 

</company> 
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in a given range)^. Next we recast the notion of non-interference in terms of labeled 
elements, namely, we say that a transformation is free of interference from all elements 
labeled by if its result does not change when the content of the ^-labeled elements 
vary over the type indicated in the label. Our research plan consists of the definition 
of three different analyses to be used as in the scenario of Figure 1 in the next page. 
According to it an interactive query (that is, a query that was written to be executed just 
once) will first pass through a complete static analysis (that rejects transformations that 
are manifestly unsafe) and then through a sound dynamic analysis. Instead, programs 
that are expected to be used several times will pass through a cycle of sound static 
analysis (possibly preceded by a complete analysis) before being executed without any 
further dynamic check. In this paper we concentrate on the sound dynamic analysis for 
CDuce programs, that is, the grayed part of the figure. 

Outline. We start in Section 2 by a brief overview of the functional core of CDuce. 
In Section 3 we formally define the non-interference property for CDuce programs and 
introduce CDuce ^ a conservative extension^ of CDuce in which expressions occurring 
in a program may appear labeled by security labels. Section 4 is the core of our work. 
It defines the dynamic analysis that detects interference free programs. The idea is to 
define an operational semantics for CDuce se such that (i) it preserves the semantics of 
unlabeled programs and (ii) it ensures that whenever a label t is absent from the final 
result of a program, then the program is free of interference from all expressions labeled 
by 1. Thanks to these properties the analyzer has simply to label and run a transforma- 
tion and to refuse to return the (unlabeled) result when this contains unauthorized la- 
bels. Of course the heart of the problem is label propagation in pattern matching, whose 
definition is made difficult by the type 
driven semantics, the presence of sub- 
typing, and the use of a first-match 
policy. In Section 5 we prove that 
our analysis satisfies the aforemen- 
tioned properties; this goes through 
proving that CDuce_ 2 > satisfies the 
subject-reduction property, that it pre- 
serves the semantics of CDuce, and 
that it constitutes a sound analysis 
for the non-interference property. The 
last two points present some techni- 
cal difficulties (without any practical 
impact) due to the type system of 
CDuce that does not satisfy the min- 
imum typing property and induces in 
CDuce a non-deterministic seman- 

^ We do not require the existence of any order on such labels (in our framework security policies 
will be expressed in terms of presence/absence rather than relative order of label), therefore our 
labels must not to be confounded with the security labels used in multilevel security systems. 

^ The extension is conservative with respect to both the type theory and the equational (reduc- 
tion) theory. 




Fig. 1. Analysis scenarios 
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tics: thus the two properties must be proved to hold for all possible reductions of a 
program. In Section 6 we comment a more significant example that illustrates some 
security policies that cannot be expressed in terms of access control. We conclude our 
presentation by sketching in Section 7 some research perspectives. 

Related Work. Security issues for XML have been addressed by several works but 
none of them tackles the problem of information flows. They either focus on access 
control (e.g., [10,7, 6]) or on lower level security features such as encryption and dig- 
ital signatures for which commercial products are becoming available (e.g [14]). For 
instance, Damiani et al. [7, 6] detect accesses to confidential data by applying a static 
marking of the documents and by dynamically stripping off marked elements. In other 
words, they deal with access control (confidential data is accessed for computing the 
result) whereas our approach accounts for implicit flows (confidential information can 
be inferred from the result) like the detection of covert channels. The same holds true 
for the work of Gabillon and Bruno [10] where access control is performed by running 
queries on views of the XML documents dynamically generated by stripping off unau- 
thorized data. Other works devise flow analyses for programming languages for XML 
(e.g., [3]) but these analyses are not developed to verify security properties. 

The study presented here draws ideas from several sources. The dynamic prop- 
agation of labels was first introduced in Abadi et al. [1], where a dependency anal- 
ysis for call-by-name X-calculus is defined by extending the reduction semantics to 
labeled X-terms. Although their work was not motivated by security reasons (they ad- 
dress optimization issues) what we describe here essentially adapts their technique to 
type driven reductions. Label propagation was successively used for security purposes 
in later works, for instance [5, 15-17]. In particular, in [15] Myers and Liskov use labels 
for the same purposes as we do; however their security model is defined for and relies on 
languages that explicitly manipulate labels, while in our or in Abadi’s et al. approach, 
properties are stated for an unlabeled language and labels are introduced on the top of it 
as a technique to identify (unlabeled) programs satisfying these properties. Finally, all 
the cited label-based approaches fundamentally differ from the study presented here in 
that they do not account for type driven semantics (nor for pattern matching) distinctive 
of XML transformations. 

The presence of type driven computations preclude us the use of classical defini- 
tions and detection techniques of non-interference (e.g. those of [19, 20]), since in this 
case ignoring static type information would yield a far too weak definition of non- 
interference. Actually, our notion of non-interference differs from the classical one in 
that the latter usually relies on a hierarchical structuring of security levels (high-level 
inputs do not interfere with low-level outputs) while here we spot non-interference of 
single pieces of code (the value of some data does not interfere with the result of a 
query). This difference must be understood as the fact that we want to characterize 
the flows in a single transformation while classical non-interference rather applies to 
system-wide flows. 
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2 The CDuce Language 

CDuce is a functional programming lan- 
guage tailored to the manipulation of XML 
documents, for which it uses its own no- 
tation. The XML document in the previous 
section becomes in CDuce the expression on 
the right. The syntax is mostly self describ- 
ing: tags are denoted by angle brackets and 
are followed by sequences. Sequences are 
delimited by square brackets and in this case 
are formed by other elements, but in general 
they may contain expressions of any type 
(note that some tags are followed by strings 
as the latter are encoded in CDuce as se- 
quences of characters). This expression has 
the type Company defined right below it. The 
types of sequences are dehned by regular expressions on types. For example, the hrst 
type declaration states that a company is a sequence tagged by <company> and composed 
of zero or more worker elements. Had we defined the type of workers as follows: 

Worker2 = <worker ceo=?Bool>[ Sname Name Salary (Email | Tel)? ] 

then workers elements would have an optional (as indicated by=?) boolean attribute and 
list a last optional element that is either of type Tel or of type Email. Note also thatWorker 
is a subtype of Worker2 since every value of the former type is also a value of the latter 
type. The queries dehned in the previous section can be expressed as: 

[Q1 :] let <company>x = mycompany in transform x with <worker>[ y z _ ] -» [ <worker>[ y z ] ] 

[Q2:] type MoreThanMe = <saiary>[1600-*] 

iet <company>x = mycompany in transform x wifh <worker>[ y z MoreThanMe ] [ <worker>[ y z ] ] 

where mycompany is a variable that denotes our previous <company> XML document. 

In the hrst query the let expression matches mycompany against a pattern that binds 
the variablex to the sequence of worker elements ofmycompany. Thetransform expression 
applies to each element of a sequence a pattern that returns a sequence and then con- 
catenates all the results (non matching elements are simply discarded). In this case it 
transforms the sequence of Worker elements into a sequence of<worker>-tagged elements 
containing just surname and name elements ^ and z respectively capture the surname 
and name elements, while the_ pattern matches every value). Had we dehned the type 
Company as<company>[(Worker|Cllent)*], that is a heterogeneous sequence of workers and 
clients, thenQt would still have returned the same result as transform would discard 
client elements. The query Q2 is similar, with the only difference that the transform pat- 
tern matches only if the salary element has type MoreThanMe, that is, if its content is an 
integer greater than or equal to 1600^. 

A detailed presentation of CDuce is out of the scope of this paper. The interested 
reader can refer to [2] and try the interactive prototype atwww.cduce.org. For the purpose 

^ The results the two queries are the central and right expressions in Figure 4 from which we 
erase all the label annotations of the form otj:, or u;:. 



<company>[ 

<worker>[ 

<surname>"Durand" 

<name>"Paul" 

<salary>[6500] ] 

<worker>[ 

<surname>"Dupond" 

<name>"Jean" 

<salary>[1800] ] 

<worker>[ 

<surname>"Martin" 

<name>"Jules" 

<salary>[1200] ] 

] 

type Company = <company>[Worker*] 
type Worker = <worker>[Sname Name Salary] 
type Sname = <surname>String 
type Name = <name>String 
type Salary = <salary>[lnt] 
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of this work it will suffice to say that all CDuce constructions are encoded in the fol- 
lowing core language defined in [9] : 

t::=a | C | /ta.C C::=A | -.C | CVC | CAC A::=t^t | fxf | h | 0 | 1 

We use t to range over types, C over boolean combinations, A over atoms, and b over 
basic types (Int, Char, . . . )i 0 and 1 denote respectively the empty and top type, and to- 
gether with arrows and products they form the atoms of arbitrary boolean combinations 
(negation, union, and intersection types) and recursive types /ra.C The expressions of 
the language are: 

e: := c I X I fun L -»*«)(■ g | | (ei,e2) | match e with pi~^ei \p2~^ei 

The syntax is essentially that of a higher-order functional language with constants, pairs 
and pattern-matching. What distinguishes it from similar languages is the dehnition of 
patterns (given later) and of functions. In the latter note that the name of a function 
(which may appear in its body as functions can be recursive) is annotated by a non 
empty list of arrow types (this list is called the interface of the function). The presence 
of more than one arrow type declares the overloaded nature of the function, which has 
all these types, thus their intersection as well (see the typing rule for functions later on). 

The operational semantics is defined in terms of the set Y of values, ranged over by 
V and formed by all closed and well-typed CDuce expressions of the form 

V ::= c I fun | (vi,V 2 ) 

The semantics is given by the reduction relation which is defined as follows: 

(viV 2 ) ^ e[vi/ f-,V2/x] (vi =fun/(-)(x).e) 

(match V with p\-^ei \ p2~^e2) ^ e\ [v/ p{\ (v/pi 

(match V with pi~^ei Ip2^e2) ^ C 2 \^Ip 2 \ (v/pi=n,v/p2AQ) 

where Q denotes the matching failure and e[v//?] is the expression obtained by applying 
the substitution vjp to e. The first rule is standard (3-reduction for recursive functions. 
The second rule states that if the first pattern matches, then the expression of the hrst 
branch is executed after having applied to it the substitution v/p\ of the variables cap- 
tured by pi when matching v. The second rule states that the same happens for the 
second branch when the first pattern fails and the second matches (the static type sys- 
tem of CDuce ensures that the patterns cannot both fail). 

Reductions can take place in any evaluation context. A context, denoted by is an 
expression with a hole substituted for a sub-expression. An evaluation context, denoted 
by is a context whose hole is not in the scope of an abstraction or of a pattern and 
which induces a (leftmost) evaluation order on applications and pairs. Formally, e'^ e' 
implies S'[e\ where S' is dehned as 

S ::= [] I (S,e) | (v,S) | Se | vS | match S with pi~^ei \ p 2~^£2 

As anticipated, key definitions for CDuce are those of pattern and pattern matching. A 
regular tree built from the following grammar 

p::=x I t I PI&P2 I pi\p2 I (pi,P2) I (x:=c) 



^ The distinction between atoms and combinations is introduced in order to avoid meaningless 
recursive types such as /ra.aV a. 
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is a pattern if (i) on every infinite branch of it there are infinite occurrences of the pair 
constructor®, (ii) patterns in an alternative capture the same variables, and (Hi) patterns 
in a conjunction capture pairwise distinct variables. The semantics of patterns is defined 
by fhe matching operation: the matching of value v against a pattern p is denoted by vjp 
and returns either Q (matching failure) or a substitution from the variables of p into Y : 



v/t = {} if V : t 

v/x = {x v} 

v/pi\P2 = v/pi iiv/pi^Q. 

v/pi&p2 = vjpi^vlp2 
{vuV2)/ipi,P2) = V\Ip\®V2IP2 



vjt = £l if V : -if 

v/(x := c) = {x c} 

'>’Ip\\p 2 = vIp 2 iiv/p\=Q. 

v/{pi^P2)=i^ if V is not a pair 



where yi ®J2 is when yi = Q or 72 = £2 and otherwise is the substitution y G 
yDom{yi)uDom{j2) defined as y(x) = yi (x) when x G Dom{y\ )\Dom{y2), as y(x) = y2(x) 
when X G Z)om(y2)\Dowi(yi), and asy(x) = (yi (x) ,y2 (x) ) whenxGT>om(yi)nDom(y2). 

The semantics of patterns is rather intuitive. There are two possible causes of failure 
for a pattern matching: a type constraint which is not satisfied by the matched value, 
or a pair pattern applied to a value that is not a pair. The alternative pattern pi\p2 
has a first-match policy: the object is matched against p2 if and only if the matching 
against p\ fails. When a variable x appears on both sides of a pair pattern, the two cap- 
tured elements are paired together as expressed by the third case of the definition of (g). 
The default-value pattern (x := c) usually appears in the right-hand side of an alterna- 
tive pattern to return a default value when the left-hand side fails. The combination of 
multiply-occurring variables and default- value patterns within recursive patterns allow 
very expressive captures: for instance, the recursive pattern p = (x&lnt,;?)|(_,/ 7 )|(x:=nil) 
binds X to the sublist of all integers occurring in a heterogeneous list (encoded a la lisp) 
when this is matched against it. On the whole, all a pattern does is to decompose values 
in subcomponents that are bound to variables and/or matched against types, whence the 
type driven computation. 

Values are also at the basis of the definition of subtyping. Indeed, the key charac- 
teristic of CDuce (from which all others derive) is that the subtyping relation is defined 
semantically via a set theoretic interpretation of the types. This interpretation is very 
easy to define: a type is interpreted as the set of all values that have that type. So for 
example t\ x ?2 is the set of all expressions (vi,V2) where v, is a value of type f, ; t\ —y t2 is 
the set of all closed functional expressions fun/(^i’ -’*")(x)e that when applied to a value 
of type t\ return a result in t2 \ union, intersection, and negation of types have the usual 
set theoretic meaning. This (semantic) interpretation is used to define the (syntactic) 
subtyping relation: a type j is a subtype of t if the interpretation of s is contained in the 
interpretation of t. 

Typing rules for CDuce are summarized in Figure 2 . For a more detailed explanation 
the reader can refer to [ 9 ]. Rules (var), {pair), and {subsum) are standard. The rule 
{appl) is less common, since one usually expects t\ to be of the form si —* S2 and the 
rule to infer for the application the type S2 provided that t2<s\. However because of the 



® Infinite trees represent recursive patterns while the condition on infinite branches endows the 
set of patterns with a well-founded order that ensures termination for the algorithms of typing 
and matching. 
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r\-x: F(x) 



r h ei : fi r h 62 : ^ F h ei : fi F F 62 : ?2 

(var) (pair) 

FF (61,62) : T X t2 FF6i62:fi»f2 

(forii =sAIpi^, S 2 = sA-'Ipi^) 

T\- e:s<\pi^'Alp2^ F, F 6 ; : t,- 

(mcitch) 

FF match 6 with pi^ei |p2"^e2 : V{,>,9io}F 



(appl) 



(Vi) r,{x:ti),(f : „ti^ Si)\- e :si 

; ^ iabstr) 

FF fun /'i^*i’ -'”^"”)(x).6 : A,=i..nF ^ ■«/ 



F F 6 : i < r 
FF 6 :r 



(subsum) 



Fig. 2. Typing rules 



presence of boolean combinations f could be for example of the form ((^i — > mi) A (52 ^ 
M2)). To that end we introduce a partial binary operator on types •. Intuitively t • s 
denotes the least type (if it exists) such that t < s—* {t us) and the rule {appl) does not 
fail only if the operator is defined (in practice, we subsume ti to the least arrow type 
with domain 12)- So, for example, if our function of type ((^i mi) A (52 M2)) is 

applied to an expression of type (si V S2), then the type system will infer that the result 
of the application has type ((si ^ mi) A (52 ^ M2)) • (■?! V 52) = («i V M2). Similarly we 
introduce two projections operators Jii and K2 such that Jti (t) (respectively Jt2(t)) is the 
least type that makes t < Jti (t) x 1 (respectively 1 x 712(1)) hold. It is important to note 
that •, Jti, and Jt2 are all computable (once more see [ 9 ]). 

Rule (abstr) is not very standard, either. It states that a (possibly overloaded) func- 
tion is typed by the intersection of all the types declared in its interface: we repeat the 
check for each type in the interface and handle the typing of recursive calls by recording 
for / the expected type. Actually the rule in Figure 2 , is a simplification of the CDuce 
rule in [ 9 ], the latter is very technical and makes the minimum typing property fail (so 
that there does not exist a canonical type representing the set of all types of a given 
expression). However, we will see that this does not affect our analysis and, therefore, 
that the rule dehned in Figure 2 is enough. 

Finally the rule (match) states that the type of a matching operation is the union of 
the types of the branches^, and for that it uses several auxiliary notations: = denotes 
syntactic equivalence of types whereas ~ denotes the semantic one (s ~t only if s and 
t denote the same set of values); sj p denotes the type environment that assigns types to 
the capture variables of p when this is matched against a value of type s; \p^ denotes 
the type accepted by p, that is, the type whose interpretation is the set of all values for 
which p does not fail: \p^ = {v | vjp^ Q}. Again, 5 computable. The type system 
is sound since it satisfies the subject reduction property: if F F e : f and e e' , then 
F F e' : t (where means “reduces in zero or more steps”). 



3 Non-interference and Labels 

The first step toward the dehnition of our security analysis is the characterization of 
information flows. In general, these are difficult to dehne and therefore it is customary to 

^ Precisely, of the branches that have a chance to match, that is those for which si 9 ^ 0. This 
distinction matters when typing overloaded functions: see [9]. 
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rather characterize their absence via the non-interference property: given an expression 
e, a sub-expression e' occurring in e does not interfere with e if and only if whenever 
e returns some result, then also every expression obtained from e by replacing e' by a 
different expression yields the same result. Note that this (informal) definition does not 
involve types and because of that it is unsuitable to describe non-interference for type 
driven semantics. A definition of non-interference for CDuce transformations must take 
types into account. 

Types have been widely used in previous work on language-based information-flow 
security (see [18] for a very broad review) and non-interference definitions have been 
given for typed languages. What is new in our framework is that the essence of the 
definition of non-interference relies on types. In particular, changing the type of an ex- 
pression occurrence in a program can change the non-interference properties of that 
expression occurrence. Therefore our definition of non-interference is stated with re- 
spect to a type t\ 

Definition 1 (Occurrence). Let e be a CDuce expression. An occurrence A of e is a 
(root starting) path of the abstract syntax tree of e. If A is an occurrence of e, we use 
ca to denote the sub-expression of e occurring at A, and [] to denote the context 
obtained from e by inserting a hole at A. Thus e = [e\]. 

Definition 2 (Non-interference w.r.t. t). Let ebea CDuce expression, A an occurrence 
ofe, and t a type. The occurrence A does not interfere in e with respect to t if and only 
if for all CDuce values v and v', if 0 v':t and e v then [v^] v. 

Let e be the application of a query to an XML document that stores some information 
of type t in an element located at A. We want to define when the query allows one to 
infer information about A more precise than its type. The definition above states that 
the information stored in A is not disclosed if the fact of storing any value v' of type t 
in A does not change the result of the query. 

The definition is reasonable as it simply encompasses the static knowledge about 
the type of an occurrence. For instance, on our example it states that a query is safe 
(i.e. interference free) if and only if it cannot distinguish two documents that contain 
two different integers in corresponding salary elements. Similarly, a query that distin- 
guishes documents with integer salaries from documents in which salaries elements 
contain boolean values is safe, too*: this is reasonable to the extent that such a test 
cannot in practice be performed as it would contradict the static knowledge we (or an 
attacker) have of the document. This new definition induces also a very nice interpreta- 
tion of security, according to which a transformation is safe if it does not reveal (about 
confidential data) any information that is not already statically known. 

A last more technical point. In Definition 2 we tested non-interferent occurrences 
against values rather than against generic expressions. This simplifies the definition 
inasmuch as we do not have to take into account the typing and the possible capture of 
variables that are free in the expressions inserted in the context. However, this does not 
undermine the generality of our definition as shown by Proposition 1 : 

* More interestingly, had we defined salaries to be of type<salary>[i600--*], then also the query Q2 
would result safe (i.e. interference free). 
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Definitions ((r/0)-contexts). Let r and 0 be type environments, t and t' types, and 

a context. ^[] is a {T / Q ,t / t')-context if T \- '^[] : t is provable under the hypothesis 
that 0' h [\.t' holds for every extension Q’ ofQ. 

Proposition 1 (Generality). Let e be a CDuce expression such that The:?. Let A 
be an occurrence that does not interfere in e w.r.t. f. Then for all F', F'', and e' , if 
F',F,F" h e'\t' , e v, and [] is a {{T' ,T) /T” ,t / t')-context, then [e'\ v. 

Our next step consists in defining a way to identify occurrences. This can be easily 
obtained by marking CDuce expressions by security labels. Thus we define CDuce 
obtained from CDuce by adding expressions of the form “4 : e”, where t is a type and 
f is a metavariable ranging over a set J§f of labels. As anticipated labels are indexed 
by types that record the static knowledge of the expression at issue. The type is used 
to verify non-interference of the expression. This is considered not to interfere with a 
given computation if the fact of making the labeled occurrence vary over the values of 
the type specified by the label does not affect the final result of the computation. 

Of course, this is sensible only if the fact of making the occurrence vary over such 
a type yields well-typed expressions. In other terms, as suggested by Proposition 1, 
the type indexing a label must be a viable type for the expres- 
sion marked by the label. In order to formalize this property we Fl-e:^ s <t 
endow CDuce ^ with a type system formed by all the typing Fh(4:e):t 
rules of CDuce, summarized in Figure 2, plus the one on right. 

By definition a CDuce ^ expression is a CDuce expression where some subexpres- 
sions are marked by a list of labels. We call strip the function that transforms the former 
into the latter by erasing all the lists of labels and we denote if by [ J . 

Technically, we consider the syntax of CDuce ^ as a description of a decoration of 
the syntax tree of e in the sense that the (lists of) labels are not nodes of the syntax 
trees but tags marking these nodes. ^ The reason for this choice is that in this way the 
same path A denotes an occurrence of a CDuce expression and the corresponding 
occurrence in the stripped expression. In this way we avoid to define complex map- 
pings between occurrences of expressions in CDuce _$f and the corresponding ones in 
the stripped version. In particular this makes the following property hold 

Proposition 2. Let e be a CDuce ^ term with an occurrence A, then (/) \e\\ = [e\^ 
and (a) I'rfl [caIJ = [Wa] 

which yields a simpler statement of the non-interference theorem (Theorem 2). 



4 Security Analysis 

Our analysis is defined by endowing CDuce with an operational semantics defined 
so that (/) it preserves CDuce semantics on stripped terms (e v [e\ [vj ) and 
(a) label propagation provides a sound non-interference analysis for CDuce expression 

^ More formally we consider that a CDuce ^ term denotes a pair formed by a CDuce term and 
a map from the (root starting) paths of the abstract syntax tree of the term to possibly empty 
lists of labels. 
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(i.e., if a label is not propagated, then the labeled expression does not interfere with 
the result). It is not difficult to define a trivially sound analysis (just propagate all la- 
bels)^*’. The hard task is to define a hne-grained analysis that propagates as few labels 
as possible. Needless to say that the core of the definition is label propagation in pattern 
matching, where several kinds of information flows are possible; 

• Direct (or “explicit”) information flows, due to the binding of labeled expressions 

to variables, such as in: match e with <worker>[ x y <salary>[z] ] ^ . . . z. . . | ... 

where the value of the salary is bound toz and flows through it into the first branch 
expression. 

• Indirect (or “implicit”) information flows due to patterns: the information flows by 
the deconstruction of the matched value and/or the satisfaction of type constraints, 
such as in: match e with <worker>[ x y <salary>[1- -900] ] -> ei | ... 

where the information of the range of the salary flows intoei. 

• Indirect information flows due to use of a first match policy: information can be 
acquired from the failure of the previous branch; 

match e with <worker>[ x y <salary>[1- -900] ] -^ei | _ ->62 
where the information that the salary value is not in (1- -900) flows into 62. 

The last example also shows that non-interference is undecidable as the non- interfer- 
ence of the value of the salary element is equivalent to deciding the equivalence ofei 
ande2 (CDuce is Turing-complete). 

Once we have established whether a label must be propagated or not, here comes 
the problem to decide with which type we have to decorate it so as not to infringe 
subject-reduction. Consider the example of the application of a labeled function value 
to another value: (£s-,t'-Vi)v2- Surely the function, seen as apiece of data, interferes with 
the computation. Therefore its label must be propagated. A sound solution is to reduce 
the application to f„:(vi V2), for a suitable choice of u. Here the choice for u is easy: the 
type system ensures that V2 is of type s therefore it suffices to take u = t. When the type 
of the label is not an arrow but a generic type t' , then we resort to the • operator and set 
u equal to t' • s' for some s' such that V2 .s' . 

Formally, we start by (/) defining CDuce values, which are obtained by adding to 
the definition of CDuce values the production v ::= it : v; {ii) defining the semantics 
of v/t for the new values in the following case: {it : {v\tV2))/{pi,p7) = p\®vil pi\ 

and (Hi) adding to evaluation contexts the production S’ ::= it : S. Note however that 
we do not define a pattern for labels, as we want labels to describe information flow, 
rather than to affect it. 

Next we define the operator “//” that calculates the set of labels that must be prop- 
agated at pattern matching. More precisely, ell p denotes the list of all labels that must 
be propagated when matching expression e against pattern p, and is defined in Figure 3 
(where “@” and are the “append” and “cons” list operators, respectively). This def- 
inition forms the core of our analysis, therefore it is worth explaining it in some details. 
Let ^ {e) denote the set of labels occurring in e, then all the labels in vUp are contained 
in J§?(v). The idea is to collect in vUp all the labels that mark expressions whose value 
may affect the result of the matching. Intuitively, the definition of must ensure that 
if a label i in ^{v) is not in vHp, then for all v' obtained by substituting in v some 

The analogous trivially complete analysis is the one that does not propagate any label. 
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vUx = 00k 
v//(v:=c) =0ok 




clip = 00 k 


if c e IpI 


clip = 0fail 


if c ^ \p] 


(1, : v)llp = 0faii 


if f A Ip^ ~ 0 
if (l' < t) 


{£,' : v)Kt = 00 k 


II 


if (f Af' 9^ 0) A (F ^ t) 


(vi,V 2 )//f = (vi//7Ci(r))@(v2//7t2(t)) 


if 7t;(f) are defined and Villni(t) ^ 0faii 


(vi,V 2 )//f =0fail 


otherwise 


(fun/(^'’ •’^'■)W.e)//r =0ok 


if fun/(^u-;'")(x).e/f/a 


(fun /(h; -;^-)(x).e)//r = 0faii 


if fun 


vUp\&.p 2 =0fail 


if (vi//pi) = 0faii or (V 2 //P 2 ) = 0fail 


vUp\&.p 2 = (v///?i)@(v//p 2 ) 


otherwise 


{£t ■ v)ll{pi,P 2 ) =£■■■■ (vll{p\,P2)) 


if (tAKpi,P2)I 9^0) 


(v\,V 2 )H{puP 2 ) =0fail 


if (vi//pi) = 0faii or (V 2 //P 2 ) = 0fail 


(vi,V2)//(pi,P2) =iy\llp\)®{v2llp2) Otherwise 
(fun f^*^'’ -’*''Xx).e)ll{pi,p 2 ) = 0fail 


[£t ■■v)Upi\p 2 = ({It ■■v)Upi) 


if t < Ipi{ 


(It ■■v)Upi\p2 = ({It : v)//p2) 


if tA\pi^ ~ 0 


(i, : v)Upi\p2 =£■■■■ (vUp\\p2) 


otherwise 


vllpi\P2 =0ok 


if vtpi = 0ok 


Vllpi\p2 =0fail 


if v/c, and(vi///?i) = (v2//p2) = 0fail 


v//pi|p2 = (vKpi)@(vHp2) 


if V 7^ c, V 7^ £j:v', vUp\ 7^ 0ok, and 
(vHpi) and (vHp 2 ) are not both equal to 0faii 



Fig. 3. Propagating labels with patterns 



f-labeled occurrences by “admissible” (i.e. that respect the type constraints indexing €) 
values, the match v' / p either always succeeds or always fails^*. 

The first two cases in Figure 3 are the easiest ones insofar as variables and default- 
value patterns always succeed; therefore no label needs to be propagated. 

Matching a constant is also simple as there is no label to propagate. Note, however, 
that we use an index to distinguish two different cases of non-propagation; the case in 
which labels are not propagated because the pattern always fails (0faii) and the case 
in which they are not propagated because the pattern always succeed (0ok)- When we 
match a labeled value against a pattern p and the type of the label does not intersect the 
type accepted by p, then we know that making v varying over t always fails. 

The case for a type constraint pattern is the key one, as CDuce’s pattern matching 
is nothing but a highly sophisticated type-case with capture variables: if f' A t ~ 0, then 
we are in the previous case so v/t always fails; if t' < t, then for every v of type f', v/t 
always succeeds so no label is propagated; otherwise, for all v' in the intersection of 
t and t' v'/t succeeds, while for v' in their difference it fails, thus we propagate £ and 
check for other labels. 



However, this is not enough to ensure non-interference: for instance consider {£boo\ 
v) //(tme&(A:;= l))|(false&(v:=0)) which always succeeds. 
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When the matched value is a pair then we propagate the union of the labels prop- 
agated by matching each sub- value against the corresponding projection of t provided 
that: (/) the projections are defined, because otherwise we are matching a pair against a 
type with no product among its super-types (e.g., s^t) and the pattern always fails, and 
(ii) none of the two sub-matching fails, because the whole pattern would then fail and 
therefore it would be silly to propagate the labels of the other component. Note that this 
case uses the distinction between 0ok and 0faii, the other cases being in conjunction and 
pair patterns — which fail if any sub-check fails — and in disjunction patterns — which 
fail if both sub-checks fail — . Since there is no pattern that deconstructs functions, these 
can be soundly matched only against types (or captured). Note however that the type of 
a function value is fixed by its interface. Therefore even if we make labeled expressions 
occurring in the function vary, this does not affect the type of the function, ergo the 
result of pattern matching. For this reason (fun (x) .e) II t is dealt in the same way 

as constant values. The cases of conjunction patterns as well as those for pair patterns 
need no particular insight. It may be worth just noticing that in the first case of pair 
patterns ^ is propagated also when the pattern always succeeds (i.e. t < \{PUP2)S) as 
the simple fact of deconstructing the pair may yield interference^^. 

The cases for alternative patterns are more interesting as they take into account the 
use of the first match policy. In particular in the first two equations for match, propaga- 
tion is calculated only on pi or p 2 according to whether the first match always succeeds 
or always fails. If instead the result cannot be determined with certainty, then the label 
is propagated and the search for other labels is continued on v. All remaining rules are 
straightforward. 

Now that we have determined the labels to propagate we have to choose the type 
constraints to decorate them with. Let L be a list of labels, t a type, and 

e an expression, we use the notation (L ), :e to denote the expression \ \ . .P} \ e 

obtained by prefixing e by the r-indexed labels in L. This notation is used to index the 
labels deduced by and define the operational semantics of CDuce as follows (a 
better definition can be found in the on-line extended version available atwww.cduce.org): 



V1V2 -^e[vi//;v2/x] 


if vi = fun/(-) {x).e 


(4 : vi)v2 ^(.fs : (viV2) 


if 0 h V 2 : i" and r • i is defined 


match V with p\-^e\ \p2~^e2 {vHpi)t : ei[v/pi] 


if v/pi ^ D and 0 h ej [v/pi] : r 


match V with p\-*e\ \p2~^e2 {vll\pi^)t ’■ (vHp2)t 


:e2[v/p2] if v/pi=D,v/p2/D 

and 0 h e2\yjp2\ '■ t 



The rule for “unlabeled” applications does not change, while the one for applications 
of a labeled function changes as we explained early in this section. Matching clearly is 
the key rule. Branch selection is performed as in CDuce but the labels determined by // 
and indexed by the type of the reductum are prepended to the result. In particular if the 
first branch is selected, then we just propagate the labels in v)/pi, as the second pattern 
is not even checked. But if the second branch is selected then the result is prepended by 
both vUp 2 and vH\pil\ while the first set of labels highlights the first of the two kinds 
of indirect flows present in CDuce, namely, those due to pattern matching, the second 



For example , consider (feooixf : v)//( (tme&(x ;= l))|(false&(x;= 0)) , _ ). 
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<company>[ 

<worker>[ 

<surname>mstring:"Durand" 
<name>nsiring:"Paul" 
<salary>[ ^mt:6500 ] 

] 

<worker>[ 

<surname>wistring:"Dupond" 

<name>nsiring:"Jean" 

<salary>[f]nt:1800] 

] 

<worker>[ 

<surname>wistring:"Martin" 

<name>nsiring:"Jules" 

<salary>[fint:1200] 

] 



<company>[ 

<worker>[ 

<surname>wstring:"Durand" 

<name>nsiring:"Paul" 

] 

<worker>[ 

<surname>«istring:"Dupond" 

<name>nsiring:"Jean" 

] 

<worker>[ 

<surname>wistring:"Martin" 

<name>nsiring:"JLiles" 

] 



<company>[ 

(^<worker>[Surname Name] • 

<worker>[ 

<surname>wstring:"Durand" 

<name>nsiring:"Paul" 

1 ) 

(•^<worker>[Surname Name] • 

<worker>[ 

<surname>mstring:"Dupond" 

<name>nsiring:"'J©an" 

]) 



Fig. 4. Labeled XML document and the results of queries Q1 and Q2 



set of labels fingers the other kind of flows possible in CDuce, that is those generated by 
branch dependency induced by first match policy. Since we selected the second branch 
C 2 knows that p\ failed and, therefore, that the matched value v does not belong to (the 
semantic interpretation of) the type \p^ Therefore, we must also propagate all those 
labels in v whose content can be (even partially) deduced from the failure of p\. By 
definition these are the labels of vU{-> \ pi j) or equivalently (see Lemma 2) ofvH\p\l. 

The reduction semantics we obtain is non deterministic since in case of reduction of 
a pattern matching or of a labeled function we resort to the type system for determining 
the type decoration of the reductum’s labels (and we know that CDuce type system does 
not enjoy the minimum typing property). As a matter of facts, we have not described a 
single analysis but a whole family of analyses. This is slightly annoying from a theo- 
retical viewpoint since we have to prove that all of them are sound so that, in practice, 
we can use any of them. However this has no consequence under the practical aspect: 
first, since we deal with a finite number of labels and types it is always possible to find 
the best analysis; second, since this problem already resides in the CDuce type system 
a choice was already made by the implementation. So we follow the implementation 
of CDuce and use in Figures 3 and in the operational semantics of CDuce the types 
inferred by the CDuce’s type-checker: we end up with the best analysis (in the family 
described above) that is sound with the current implementation of CDuce. 

Let us finally apply the analysis to the queries of Section 1 . In order to trace how 
information of each data flows in the queries, we label the content of each sub-element 
of <worker> elements by a different label as shown on the left expression of Figure 4. 
The results of the analysis ofQI andQ2 respectively are the central and right expres- 
sions in Figure 4^^. Note thatQI propagates only the labels of name and surname (the 
propagation is a consequence of an explicit flow), thus the analysis has correctly in- 
dicated that it is safe. The analysis of Q2 instead propagates also the salary label and 
therefore is rejected as insecure. The propagation of C<worker>[Surname Name] is caused by 

Sequences are encoded by right associative nested pairs ended by ‘nil, XML elements <tag at- 
tributes>s become triples {‘tag, {attributes, i)), while transform is obtained by iterating matching 
expressions on the elements, and encapsulating the results into a sequence. 
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the in the first match rule in CDuce^’s operational semantics. In particular when 
matching the pattern of transform against an element, the third rule for pairs in Fig- 
ure 3 decomposes the test over each projection, one of which calculates, say in the first 
loop, (£|nt:6500) /^MoreThanMe which by the second rule for types in Figure 3 is equal 
to 1. Note also how the position of the label denotes different kinds of properties: for 
example we specified <salary>[£ :1200] since we wanted to capture only transformations 
that depended on the content of the salary element, while if we rather had specified 
I :<salary>[1200] we would have captured also transformations that test the presence of 
this element (in the case it were optional). 

We want to stress that our analysis can check very complex security entailments. 
As explained in the introduction, the independence of the result from salary can also 
be ensured by access control techniques or by stripping salary values from the source 
document. Flowever, our technique allows one to check that even if a query can access 
both salaries and names it cannot correlate them. To that end it just suffices to verify that 
the presence of a m or of a « label in the result implies the absence of the i label, and 
vice-versa. According to this policy query Q1 would be accepted since I does not occur 
in the result while query Q2 would be rejected since all the three labels are in the result. 
A query that plainly returned the list of all salaries (without any name or surname) or 
some statistic about them, would be considered safe, too. More generally, by using label 
propagation and some logic (e.g. propositional one) on labels we can define complex 
security policies whose verification, trivial in our technique, would be very hard (if not 
impossible at all) by standard access control techniques, as we show in Section 6. 

Finally, we can recast the analysis scenarios we outlined in the introduction (Fig- 
ure 1) to the present setting. A sound static analysis for our system is an analysis that 
computes labels that will be surely absent from result of the dynamic analysis, while a 
complete static analysis will determine labels that will be surely present in the same re- 
sult. Therefore, here completeness is stated with respect to the dynamic analysis rather 
than with respect to the non-interference property. With that respect the work presented 
here constitutes the cornerstone of the outlined architecture. 



5 Properties 

In this section we briefly enumerate the various properties of our approach. For space 
reasons proofs and less important lemmas are omitted or just sketched. They are all 
reported in extended version of the article available on-line. 

In what follows, when we state that e is a CDuce or CDuce ^ expression we im- 
plicitly mean that e is a well-typed (CDuce or CDuce expression. 

Lemma 1 (Strip). Let e be a CDuce ^ expression. If e e' , then \e\ \e'\ . 

This lemma has two important consequences, namely (f) that CDuce ^ is a conservative 
extension of CDuce with respect to the reduction theory, and {ii) that despite being non- 
deterministic the reduction of CDuce ^ preserves the semantics of CDuce programs: 

Corollary 1. Let e be a CDuce ^ expression. Ife ^ e' , and e e" , then \ e' \ = \ e" \ . 

The soundness of CDuce type system is proved by subject reduction and is instru- 
mental to the soundness of the analysis. 
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Theorem 1 (Subject reduction). Let e be a CDuce ^ expression. IfT L e :t and e 
e! , then T\- e' \t. 

The next lemma is useful to understand the match reduction rules of CDuce 
Lemma 2. For every value v and type t, (v//f) = {yll-<t) (modulo the indexes of0) 

In order to prove non-interference we need two lemmas, one that characterizes II and 
a second that is the non-interference counterpart of the standard substitution lemma in 
typed X-calculi: 

Lemma 3. Let p be a pattern, v a CDuce ^ value with an occurrence A such that 
va = it'vi- If ^ vllp, then for all v' such that 0 h v':t, (yj [v'] j pfLl) 

holds true. 

Lemma 4 (Substitution). Let p be a pattern, v a CDuce ^ value whose occurrence A 
is such that va = f ■ v' . If£ does not occur in the image ofv/p and i(fvHp, then for all 
v" such that 0 h v":t, we have that [v^'] j p is pointwise equal to v/ p. 

We can state the non-interference theorem: note that the non-determinism of CDuce ’s 
reduction is accounted for by the quantifying on all results v of the expression e. 
Theorem 2 (Non-interference). Let e be a CDuce expression, with an occurrence A 
such that ca = It For every value v such that e v, if £ ^ -Sf (i')- ^^en A is non 
interfering in [e\ with respect to t, i.e., Vv' G CDuce s.t. 0\~ v' : t, we have \f^A [''Ij 

LvJ 

By using Proposition 2 it is easy to see that the conclusion of the theorem implies 
the Definition 2 of non-interference, justifying in this way the “i.e.” we used in the 
statement of the theorem. 



6 A Last Example 



We end our presentation by commenting a 
more articulated example to illustrate the 
use of our technique to define and verify 
complex security policies that cannot be ex- 
pressed in terms of access control. We sup- 
pose to store in XML-documents informa- 
tion about persons that have to pass some 
examination. The form of the documents is 



type ExamBase = <exam_base>[Person*] 
type Person = <person gender="M"|"F"> 
[Name Birth Grade?] 
type Name = <name>String 
type Birth = <birth>[Year Month Day] 
type Year = <year>[lnt] 
type Month = <month>MName 
type MName= "Jan"|"Feb"|"Mar"|''Apr"| • • • 
type Day = <day>[1--31] 
type Grade = <grade>[lnt] 



described by the type declarations on the right. As we see every document records a list 
of names, with personal information, and with an optional <grade> element that stores 
the numerical result of the examination. The absence of such an element denotes that 



the person has not passed (that is, either not taken or taken and failed) the examination, 
yet. An XML document that verifies this schema is shown in the left column of Fig- 
ure 5, while the central column reports the result of importing the same document in 
CDuce. 



Imagine that examination documents can be accessed by three different categories 
of users, academic staff, administrative staff, and normal users. We want academic staff 
to have unconstrained access to the information stored in the examination documents 



while we may wish to constrain the accesses for administration and normal users. As 
an example of security requirements we may wish to enforce we have: 
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<?xml version="1 .0"?> 


let eb : ExamBase = 


let eb : ExamBase = 


<exam_base> 


<exam_base>[ 


<exam_base>[ 


<person gender="M"> 


<person gender="M">[ 


<person gender= : "M">[ 


<name>Durand</name> 


<name>"Durand" 


<name> n^m^tring • "Durand" 


<birth> 


<birth>[ 


<birth>[ 


<year>1 970</year> 


<year>[1970] 


<year>[ sta^n\ ■ 1970] 


<month>Aug</month> 


<month>"Aug" 


<month> privateMName '■ "Aug" 


<day>10</day> 


<day>[10] 


<6ay>[ private^^.. 3 t) : 10] 


</birth> 


] 


1 


<grade>110</grade> 


<grade>[110] 


pass£ii3,ra6e '■ <grade>[ resul^nt : 1 1 0] 


</person> 


] 


] 


<person gender="M"> 


<person gender="M">[ 


<person gender= : "M">[ 


<name>Dupond</name> 


<name>"Dupond" 


<name> ntjm^tring • "Dupond" 


<birth> 


<birth>[ 


<birth>[ 


<year>1 953</year> 


<year>[1953] 


<year>[ sta^n\ ■ 1953] 


<month>Apr</month> 


<month>"Apr" 


<month> privateMName ‘ "Apr" 


<day>22</day> 


<day>[22] 


<day>[/?nv^?r£^i„3i) : 22] 


</birth> 


] 


1 


</person> 


] 


1 


<person gender="F"> 


<person gender="F">[ 


<person gender= statnijr '■ "F">[ 


<name>Dubois</name> 


<name>"Dubois" 


<name> nami^tring : "Dubois" 


<birth> 


<birth>[ 


<birth>[ 


<year>1 965</year> 


<year>[1965] 


<year>[ stau : 1965] 


<month>Sep</month> 


<month>"Sep" 


<month> privateuName '■ "Sep" 


<day>2</day> 


<day>[2] 


<day>[ priva<f^i- 3 i) : 2] 


</birth> 


I 


] 


<grade>120</grade> 


<grade>[120] 


passeti^rada '■ <grade>[ resul^at : 1 20] 


</person> 


1 


] 


</exam base> 


1 


1 



Fig. 5. A database of examinations: in XML, in CDuce, and in CDuce 



1 . Only academic users can have information both on names and grades or on names 
and birthdays simultaneously. 

2. The administrative users can check whether a person passed the examination (that 
is, they can check for the presence of a<grade> element) but cannot access the result. 

3. Every user can ask for statistical results on grades upon criteria limited to year 
of birth and gender (so that they cannot select sufficiently restrictive sets to infer 
personal data). 



To dynamically verify these constraints we introduce five labels fhaf we use to clas- 
sify the information stored in documents: private (that classifies the month and the 



day of birth), stat (that classifies the year of birth and the gender attribute), name 



(that classifies names), passed (that classi- 
fies grade elements), and result (that classifies 
the content of grade elements). Rather than 
document-wise, this classification is described 
directly on types as shown by the definitions 
on the right. Note that in these definitions la- 



type Person = <person gender= stat:{'W\"^")> 

[ Name Birth { passed:Gra6e)7 ] 
type Name = <name>{ name:Stnr)g) 
type Year = <year>[ stat:\r\\] 
type Month = <month>{/?nV^?r^:MName) 
type Day = <day>[ private\{\ --31 )] 
type Grade = <grade>[ resulf.\ni] 



bels have no indexes (in documents they will be indexed by the types they are labeling). 
Before executing a query the system uses this specification to generate a labeled ver- 



sion of the document as shown in the last column of Figure 5. The query is then executed 
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on the labeled document (according to the semantics of CDuce and the following 
constraints*^ are checked in the result: 

If the owner of the query is a normal user, then the result must satisfy: 
name ^ {privates statV results passed) A private ^ {statM result) 
if the owner of the query is an administrative user, then the result must satisfy: 
name ^ ^ {privateM stat\/ result) A private ^ {statM result) 

where a propositional label is satisfied if and only if the label is present in the result. 
Thus, for instance, the second constraint must be read: if the label name occurs in the 
result then private, stat, and result cannot occur in it, and if private occurs in the result 
then stat, and result do not. The difference with respect to the constraint for normal 
users is that the second constraint allows name and passed to occur simultaneously in 
the same result. Therefore a query that just tested the presence of a <grade> element 
without checking its content would satisfy this second constraint. 

If the result of a query satisfies the corresponding constraint, then its stripped ver- 
sion is returned to the owner of the query. 

7 Perspectives 

This paper contains exploratory work toward the definition of information flows secu- 
rity in XML transformations. As such it opens several perspectives both for the practical 
and the theoretical aspects. 

First and foremost the cost of the dynamic analysis must be checked against an 
implementation (we are currently working on it). We expect this cost to be reasonably 
low: CDuce’s pattern matching fully relies on dynamic type checking, therefore if we 
embed the dynamic generation of typed security labels in CDuce’s runtime the resulting 
overhead should be small. Also, to fill the gap with practice we must devise expressive 
and user-friendly ways to describe the labeling of XML documents in the meta-data 
(that is, XML Schemas and DTDs) and to express the associated security policies (for 
instance, in Section 6 we expressed them in propositional logic). These security prop- 
erties should be defined by sets of constraints (i.e., formulae of an appropriate logic) 
that are automatically generated from a specification expressed in an “ad hoc” language 
(e.g., like the authorizations defined in [6]). 

The precision of the dynamic analysis must be enhanced by program rewriting tech- 
niques. To exemplify, an obvious “optimization” consists in rewriting all closed (i.e., 
without capture variables) patterns into an equivalent type constraint pattern, by re- 
placing A for & , V for | , and x for (,), so as to transform the pattern (i|t,M&v) 
into the type (iVt)x (mAv). Indeed, the analysis on types is more precise than that 
on equivalent patterns as the latter is recursively applied to subcomponents forgetting, 
in doing so, many interdependencies. But other subtler rewritings could Improve our 
analysis: for instance p\ = (x&true)|(x&false) is equivalent to p 2 = x&(true Vfalse) but 
{(.BooC-e) U P\ = P\) while {(-BooC-e) U Pi = 0ok- This discrepancy can be identified 

For the sake of the example we expressed these constraints in a propositional logic where the 
labels are atoms, but different languages are possible. The definition of such languages is out 
of the scope of the paper and matter of future research: see Section 7. 
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with the fact that the analyses of pair, intersections, and union patterns are performed 
independently on the two subpatterns. Thus a possible way to tackle this problem is by 
transferring some information from one pattern to the other, for example by mimick- 
ing the automaton-based technique used by CDuce for the just-in-time compilation of 
patterns. 

Finally note that one of the main technical novelties of this work is to endow se- 
curity labels with constraints. The constraints at issue are quite simple, since they just 
express the static knowledge of the type of the labeled expression. It is then natural 
to think of much more expressive constraints. For example we can think of endowing 
labels with integrity constraints and define non-interference just in terms of consistent 
databases. In this perspective a program that checked the integrity of a base would be 
always interference-free even if it accessed private information. Of course checking 
non-interference in this case would be even more challenging but it could pave the way 
to (security) proof carrying code for XML. 
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Abstract. The concept of unreliable failure detectors for reliable distri- 
buted systems was introduced by Chandra and Toueg as a fine-grained 
means to add weak forms of synchrony into asynchronous systems. Var- 
ious kinds of such failure detectors have been identihed as each being 
the weakest to solve some specific distributed programming problem. In 
this paper, we provide a fresh look at failure detectors from the point of 
view of programming languages, more precisely using the formal tool of 
operational semantics. Inspired by this, we propose a new failure detec- 
tor model that we consider easier to understand, easier to work with and 
more natural. Using operational semantics, we prove formally that rep- 
resentations of failure detectors in the new model are equivalent to their 
original representations within the model used by Chandra and Toueg. 



1 Executive Summary 

Background. In the field of Distributed Algorithms, a widely-used computation 
model is based on asynchronous communication between a fixed number n of 
connected processes, where no timing assumptions can be made. Moreover, pro- 
cesses are subject to crash-failure: once crashed, they do not recover. The concept 
of unreliable failure detectors was introduced by Chandra and Toueg [CT96] as a 
fine-grained means to add weak forms of synchrony into asynchronous systems. 
Various kinds of such failure detectors have been identified as each being the 
weakest to solve some specific distributed programming problem [CHT96] . 

The two communities of Distributed Algorithms and Programming Lan- 
guages do not always speak the same “language” . In fact, it is often not easy to 
understand each other’s terminology, concepts, and hidden assumptions. Thus, 
in this paper, we provide a fresh look at the concept of failure detectors from 
the point of view of programming languages, using the formal tool of operational 
semantics. This paper complements previous work [NFM03] in which we used an 
operational semantics for a distributed process calculus to formally prove that a 
particular algorithm (also presented in [CT96]) solves the Distributed Consensus 
problem. Readers who are interested in proofs about algorithms within our new 
model (rather than proofs about it) are thus referred to our previous paper. 
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Hasler Foundation, grant No. DIGS 1825, an FPFL start-up grant, and the FU 
FFT-GC project PFPITO. 
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Table 1. Uniform “Abstract” Operational Semantics Scheme 



(env) 



“ failure detection events happens in the environment ” 

r ^ r' 



r ^ r' 



N 






N' 



(tau) 



‘ i not crashed in F ” 

r ^ N ^ r' ^ N' 



(suspect) 



r ^ r' 

i not crashed in F ” 

F ^ N ^ F' ^ N' 



suspect,® I 

N ^ > N' 

' j may be suspected by i in U ” 



The work of Chandra and Toueg emphasized the axiomatic treatment of 
qualitative properties rather than quantitative ones. Like them, also our current 
focus is on issues of correctness, not of performance. Moreover, Chandra and 
Toueg did not primarily aim at providing concrete design support for an imple- 
mentation of failure detectors. Like them, also we rather seek mathematically 
useful and convincing semantic abstractions of failure detectors. 

Vehicle of Discussion. In Table 1, we propose a uniform scheme to describe the 
operational semantics of process networks in the context of failure detectors. For 
convenience, we abstract from the way how the steps N — > N' of process net- 
works are generated (from the code that implements the respective algorithm), 
so we do not provide rules for this. It is sufficient for our purposes to observe that 
a process f in a network carries out essentially two kinds of transitions N — > N', 
distinguished by whether it requires the suspicion of some process j by process i, 
or not. Formally, we use labels suspect^@i and r@i to indicate these two kinds. 

In summary. Table 1 presents a two-layered operational semantics scheme. 
One layer, in addition to the transitions N —^N'of process networks, also 
describes the transitions F — > T' of the network’s environment, keeping track 
of crashes and providing failure detection, as indicated by rule (env). Another 
layer, with the rules (tau) and (suspect), deals with the compatibility of net- 
work and environment transitions, conveniently focusing on the environment 
conditions for the two kinds of transitions of process networks. For example, the 
boxed condition exploits the failure detector information that in our scheme is 
to be provided via the environment component F. 

A system run in this uniform scheme is an infinite sequence of transitions 

To h iVo A h fVi • • • Ft h Nt 

that we often simply abbreviate as (A h We also use the projections 

onto the respective environment runs (A)teNo &nd network runs {Nt)t&io, which 
exist by definition of the rules (tau) and (suspect) of Table 1. 
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Overview. We start the main part of the paper by an introduction (§2) to the 
various kinds of failure detectors proposed by Chandra and Toueg, including 17 
which appeared in [CHT96]. Already in this introduction, we rephrase their pre- 
vious work using the scheme of Table 1. In addition, we use this exercise to come 
up with a well-motivated proposal for a new model and way to represent failure 
detectors (§3). We formalize our proposal according to the scheme of Table 1 and 
redefine all previously introduced FDs within the new model (§4). Exploiting the 
common scheme and the formality of the framework of operational semantics, we 
formally prove that our redefinitions are “equivalent” to the original definitions 
by a mutual simulation of all possible system runs that are derivable in either 
case (§5) and draw some conclusions from having done so (§6). 

Contribution. In summary, this paper contains an original presentation, using 
operational semantics, of existing work by Chandra and Toueg, which is targeted 
at an audience in process calculi and programming language semantics. The 
paper also provides, as its main contribution, a new model to represent failure 
detectors that tries to eliminate a number of drawbacks of the original model 
used by Chandra and Toueg. Many other failure detectors have been studied 
in the literature; for the current paper, we restrict our attention to the ones 
introduced in [CT96,CHT96]. The technical contribution is a formal comparison 
of the representations of these classical failure detectors in the new model with 
their representations on the old model, which was greatly simplified by having 
both models fit the scheme of Table 1. To conclude, we argue that our new model 
for FDs is easier to understand, easier to work with, and more natural than the 
one used by Chandra and Toueg (see the justification in §6). 

Related Work. We are not aware of any related or competing approaches. 

Acknowledgments. We very much thank Andre Schiper and Sam Toueg for en- 
lightening discussions about failure detectors and, more generally, distributed 
algorithms, but they may not necessarily agree with all conclusions that we drew 
from our work. Many thanks also to Pawel Wojciechowski and the anonymous 
referees for their comments on a draft of this paper. 

2 A Fresh Look at the Model of Chandra and Toueg 

Recall that we are addressing asynchronous message-passing distributed system 
with no bounds on message delays and a fixed number n of processes. Let P := 
{1 . . . , n} denote the set of process names. All processes are supposed to run the 
very same algorithm. Processes may crash; when they do so, they do not recover. 
Systems evolve in time. T denotes some discrete time domain; for simplicity, 
we may just assume that T = Nq. At any time, the state of a system is fully 
determined by the states of the individual processes while running the algorithm, 
together with the messages currently present in the global message buffer. We 
do not formalize global states, but treat them abstractly throughout the paper. 
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2.1 Schedules 

A schedule, of the form is essentially a sequence S of global steps in 

time T, while running some algorithm starting within the initial global state I, 
where the message buffer is empty. Sometimes, we refer to just S as being the 
schedule. A step is usually produced by any one of the n processes according to 
the algorithms’ instructions: in atomic fashion, a process receives some messages 
(possibly none) from the message buffer, possibly checks whether it is allowed 
to suspect another process, and sends out new messages (possibly none) to the 
message buffer, while changing its state. Often, it is left rather informal and 
imprecise how steps are actually defined, and there are a number of variations 
for this. In both papers [CT96,CHT96], it is assumed that schedules are infinite. 

A Simple Operational Semantics View. To avoid the details of generating 
global steps from an algorithm, and to abstract away from the details of message- 
passing, we model schedules simply as infinite sequences of labeled transitions 



that denote steps between abstract global states (ranged over by N) by perform- 
ing an action p, due to the activity of some process i. The label p depends on 
whether i needs to (be able to) suspect another process j, or not. If it does so, 
we indicate this by a transition arrow labeled with p := suspect otherwise we 
simply use the label p := r, which is commonly used to indicate that “some” 
not further specified activity takes place by process i. 

In this paper, we are not at all interested in how schedules themselves are 
generated. An example of this can be found in our earlier paper [NFM03], where 
we used a reasonably standard process calculus to do this. 

2.2 Unreliable Failure Detectors 

According to Chandra and Toueg [CT96], a failure detector (FD) is a separate 
module attached locally to a process; each process i has its own private FD^. 
At any moment in time, each FD outputs a list of (names of) processes that it 
currently suspects to have crashed. FDs are unreliable: they may 

— make mistakes, 

— disagree among them, and 

— change their mind indefinitely often. 

Process i interacts with its FD^ explicitly by only being allowed to suspect an- 
other process j at any given time t, if FD/s output list contains j at this time. 

Example 1 ( “Application” ) . When a process needs to go on by the help of an- 
other process — e.g., via reception of a message — it may typically specify to 
“either wait for this process, or suspect it and continue otherwise”. However, 
it is only allowed to choose the second option if its FD permits it at the very 
moment that the process looks at the FDs output, which it may have to do 
infinitely often if the FD insists on not permitting the required suspicion. 
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More formally, the concept of process crashes is modeled by means of failure 
patterns F : T ^ 2'^ that describe monotonically when in a run crashes happen. 
For example, F(42) = {3, 7} means that processes 3 and 7 have crashed during 
the time interval [0,42]. Similarly, the concept of failure detection is modeled by 
so-called /azZrtre detector histories H : TxP ^ 2”^. For example, 77(42, 5) = {6, 7} 
means that at time 42 processes 6 and 7 are suspected by the FD of process 5. 
Given the previous example F, this means that process 7 is correctly suspected, 
while process 6 is erroneously suspected. Mathematically, a FD is a function^ 

V : {T ^ 2^) ^ 

that maps failure patterns F to sets of failure detector histories. Such sets may 
be specified by additional properties, as exemplified in Section 2.3. From now 
on, whenever we mention some F and FI in the same context, then FI G F>{F) 
is silently assumed; we may write 77x> to indicate the referred FD. 

Finally, system runs R are quintuples (F, 77, T, /, S'). Subject to the shared 
time domain T, the schedule S of a run is required to be “compatible” with the 
failure pattern F and detector history 77: (1) a process cannot take a step after 
it has crashed (according to F); (2) when a process takes a step and queries 
its failure detector module, it gets the current value output by its local failure 
detector module (according to 77). 

A process is correct in a given run, if it does not crash in this run. It may, 
though, crash in other runs. Let crashed(F) := crashed(F) := UtGT^(^) denote 
the processes that have crashed in run R according to its failure pattern F. 
Consequently, correct(F) := correct(F) := P\crashed(F). Usually, one considers 
only runs in which correct(F) yf 0, i.e., in every run at least one process survives. 
Sometimes, as for the Consensus algorithm that we studied in [NFM03] , it is even 
required that less than n/2 processes may crash. Abstractly, we use maxfail(n) 
to denote the maximal number of crashes permitted in a run. 



A Simple Operational Semantics View. To make the model fit our uni- 
versal scheme of Table 1, we need to recast the information contained in failure 
patterns F and failure detector histories 77 in an evolutionary manner as en- 
vironment transitions. Both F and 77 are totally defined over the whole time 
domain T. Thus, we may simply use transitions (7, F, 77) — ^ (7-1-1, F, 77), in which 
time 7 just passes, while we leave F and 77 unchanged. Rule (T-env) of Table 2 
serves us to generate such transitions formally. 

System configurations are of the form F \- N, where F is an element of 
the domain T x (T ^ 2'^) x (T x P ^ 2'^), and N represents some state of the 
algorithm. The rules (T-tau) and (T-SUSPECt) in Table 2 formally describe the 
conditions for a transition of an algorithm in state N to produce system transi- 
tions depending on the current information about crashes and failure detectors 

^ The term failure detector is overloaded to denote the single devices that are attached 
to processes, as well as the mathematical object that governs the output that any 
of these single devices may yield during runs. 
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Table 2. Operational Semantics for the Failure Detectors of [CT96] 



(T-env) 



□ 

(t,F,H) ^ it+l,F,H) 



(T-tau) 



{t,F,H) = F ^F' 
F N 



N ^N' 
F' N' 



i ^ F{t) 



[t,F,H) = F ^F' 
(T-suspect) 



suspect i 

N 

F N 



N' i ^ F{t) 
F' h N' 



j G 



at any time t G T. Note that in both cases the process i who is responsible for 
the transition must not (yet) have crashed at the time t when the transition 
is supposed to happen: i ^ F{t). Note further that if it is required to suspect 
some process j to perform the transition, then the respective failure detector 
of process i must currently permit to do so: j G H{i,t). It is easily possible to 
generalize this representation to the case where, to carry out a single transition, 
process i would need to suspect more than one other process; for simplicity, we 
only consider a single suspicion. 

System runs can now be fit together dynamically as infinite sequences of 
system transitions that are derivable by operational semantics rules. 

Definition 1. A T{V)-run is an infinite sequence {{t,F,H) h Nt)teT generated 
by (T-env), (T-tau), and (T-suspect), for some F,F[ with F[ G V{F). 

2.3 “Sufficiently Reliable” Failure Detectors 

Probably the main novelty of Chandra and Toueg’s paper [CT96] was the defini- 
tion and study of a number of FDs T> that only differ in their degree of reliability, 
as expressed by a combination of safety and liveness properties. These are formu- 
lated in terms of permitted and enforced suspicions according to the respective 
failures reported in F and the failure detection recorded in FI G F>(F): 

completeness addresses crashed processes that must he suspected 
by (the FDs of) “complete” processes, 
accuracy addresses correct processes that must not be suspected 
by (the FDs of) “accurate” processes. 

Note that these definitions refer to suspicions allowed by the output of the in- 
dividual FDs at any time (according to FI). By “complete” and “accurate” 
processes, we indicate that there is some flexibility in the definition of the set of 
processes that the property shall be imposed on. Note that H is, in principle, 
a total function. Therefore, at any moment, it provides FD output for every 
process — crashed or not — so there are at least three obvious possibilities to 
define the meaning of “complete” and “accurate” processes: 
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1. all processes (g P) 

2. only processes that are still alive at time t (g P \ F{t)) 

3. only correct processes (g correct(J^) = P \ crashed(i^)) 

Obviously, it does not make much sense to speculate, as in solution 1, about the 
output of FDs of crashed processes, because the respective process would never 
ever again contact its FD^. It seems much more natural to select solution 2, 
because it precisely considers just those processes that are alive. If completeness 
or accuracy shall hold only eventually, then solution 3 becomes interesting: since 
every incorrect process that is still alive at some moment will crash later on, the 
property would just hold at a later time. In infinite runs, and for properties with 
an eventual character, solutions 2 and 3 become “equivalent” [CT96]. 



Eight Candidates. Various instantiations of completeness and accuracy have 
been proposed. We recall the eight FDs of Chandra and Toueg, defined by all 
possible combinations of the following variations of completeness and accuracy. 
Note that their defining properties are quantified over all possible runs. For every 
given run, the components F and H G T>{F) are fixed, as well as the derived 
notions of which processes are considered to be correct in this run. 

strong completeness. Eventually, every process that crashes is permanently 
suspected by every correct process. 

VF, F[ : : Vp G crashed(F) : Vg G correct(F) :\/t > i : p G F[{q,t) 

weak completeness. Eventually, every process that crashes is permanently 
suspected by some correct process. 

VF, F : : Vp G crashed(F) : 3q G correct(F) :\/t > i : p G FI{q,t) 

Combined with the strong/weak versions of completeness, the following notions 
of accuracy induce eight variants of FDs, with their denotations listed in brackets. 

strong accuracy. {V / Q) No process is suspected before it crashes. 

VF, F[ :\/t : Vp, g G P \ F{t) : p ^ H{q,t) 

Note that the “accuracy set” is the alive processes, 
weak accuracy. {S /W) Some correct process is never suspected. 

VF, H :3p G correct(F) : Vt: Vg G P \ F{t) : p ^ F[{q,t) 

Note that also here the “accuracy set” is the alive processes. 

^ It might be more appropriate to define H as partial function where H{i,t) is only 
defined if i ^ F(t). One might also conclude that the (F, F)-based model is too rich. 
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eventual strong accuracy. {()V /O Q) There is a time after which correct pro- 
cesses are not suspected by any correct process. 

VF, H : 3t : yt > i : \/p G correct(F) : Vg € correct(F) : p ^ H{q,t) 

eventual weak accuracy. (OiS/OW) There is a time after which some correct 
processes is never suspected by any correct process. 

VF, H :3t :3p G correct(F) : Vt > t : Vg G correct(F) : p ^ H{q,t) 

Note that, except under strong and weak accuracy, (the FDs of) processes that 
crash (in a given run) may behave completely unconstrained (in this run) before 
they have crashed. Although the formulation of the above FDs might appear a 
bit ad-hoc (see Gartner [GarOl] for a gentle and systematic overview), some of 
them are known to provide the weakest FDs required to solve certain well-known 
distributed programming problems: OW solves Gonsensus, where less than n/2 
processes may crash; S solves Gonsensus, where less than n processes may crash; 
V solves the Byzantine General’s Problem [GT96] . 

Another important contribution found in [GT96] is the concept of reducibility 
between FDs. Essentially, it studies reduction algorithms Txi^v' that transform 
the outputs of T> into outputs of F'. As a consequence, any problem that can be 
solved using V can also be solved using F, written F F F'. If such a relation 
holds in both directions, then we write F = F'. Interestingly, FDs with either 
strong or weak completeness are not that much different with respect to their 
ability to solve problems: V = Q, S = W, OF = 02, 05 = OW- 

2.4 Another Prouiiuent Caudidate: f2 

In this subsection, we try to explain that completeness is required in the (F, H)- 
based model used by Ghandra and Toueg only because this model is unrealisti- 
cally rich, which we might regard as a deficiency of the model. 

Without the completeness property, the (F, F)-based model allows a FD 
to be incomplete, i.e., to indefinitely not suspect a crashed process. We may 
conceive this as unrealistic if we, for instance, assume that FDs work with time- 
outs. Given that any crashed process can only have sent a finite number of 
messages before having crashed, any FD will be reporting suspicion of a crashed 
process at the latest after “timeout” times “number of sent messages” units of 
time. 

It is instructive to replay the argument in the dual model of “presence detec- 
tors”, where the outputs of FDs are inversed, i.e., where H tells which processes 
are currently “trusted” by the FD. Intuitively, this dual model feels more direct 
since the trust in a process to be alive is often based on “feeling its heartbeat” . 
Of course, since it is just a mathematically dual, also this model is too rich: an 
incomplete FD may be expressed by always listing a crashed process as being 
trusted^. However, it is easy to constrain this model as to avoid incomplete FDs. 

® In this model, completeness more intuitively specifies the natural requirement that 
“you cannot feel the heartbeat of a crashed process infinitely long”. Strong com- 
pleteness makes sense in precisely this respect. The intuition of weak completeness 
is less clear: why should the heartbeat argument apply to only one process? 
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Table 3. Operational Semantics for the “Presence” Detector 1? of [CHT96] 



(f2-ENV) = (T-env) (17-tau) = (T-tau) 



{t,F,H) = r ^r' 
(1?-suspect) 



suspect ^ ® 

N — 

r N 



N' i ^ F{t) 
F' h N' 






Interestingly, the detector 17, as introduced in another paper by Chandra 
and Toueg, jointly with Hadzilacos [CHT96], represents one particular model 
variant of presence detectors that is “sufficiently poor” to render incomplete 
FDs impossible. With 17, every FD at any moment in time outputs only a single 
process that is believed to be “correct”^ or trusted; : P x T ^ P. 

(17) Eventually, all correct processes always trust the same correct process. 

G T : 3(7 G correct(E) : Vp G correct(E) :\/t > i : H{p,t) = q 

Observe why it is no longer possible to indefinitely enforce trust on crashed 
processes: since the output of H contains only a single process, the associated 
process can always suspect any of the n— 1 other processes. The property above 
eventually stabilizes to a single correct process, then permanently allowing the 
suspicion of the remaining other processes. 

It is straightforward to provide an operational semantics view for 17 by adapt- 
ing the previous configurations to the new type of H. The rule (17-SUSPECt) in 
Table 3 shows the duality of presence detectors versus failure detectors: the 
condition on a suspected j is inversed. 

Definition 2. A T{f2)-run is an infinite sequence {{t,F,H) h 7Vt)ieT generated 
by (17-env), (17-tau), and (17-suspect), for some F,H with H G 12(E). 

The FD 17 was introduced only as an auxiliary concept in the proof that ()W is 
the weakest FD solving Consensus, which works because 17 = OW. 

3 Proposal of a New Model of FDs 

There are a number of observations on the (E, i7)-based specification of FDs 
that motivate us to do it differently. 

From Static to Dynamic. The use of (E, H) as “predicting” the failures of 
processes and their detection by others in a run appears to be counter- 
intuitive from the point of view of programming language semantics where 
events of a computation are to happen dynamically and possibly non-deter- 
ministically. 

^ Original called this way in [CHT96], but it is different from the notion of a correct 
process which is precisely about a full run, and not just one moment in it. 
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From Failure Detection to Presence Detection. In Subsection 2.4, we 
mentioned the problem of completeness, which is inherent in the (F, H)- 
model, and how 17 does a good job in avoiding the problem in a poorer dual 
model. We propose to use similar ideas also for FDs that are not equivalent 
to 17. 

Intuition Mismatch. The use of a total function H to model outputs of FD 
modules counters the intuition that a process crash should imply the crash 
of its attached FD at the same time. This could be repaired by making H a 
partial function that is only defined when the respective process has not yet 
crashed. Below, we go even further and replace the H completely. 

We do not see the need to record in H an abundance of unreliable information. 
The quest to model a minimal amount of information has two consequences. 

From Unstable to Stable. The use of unreliable FD-outputs listed in H very 
carefully models a lot of unstable (and therefore questionable!) information, 
namely information that changes nondeterministically over time. However, 
in order to characterize the above six FD properties, only (eventually) stable 
information is used. We therefore propose to model only this kind of stable 
information — once it has become stable — and to freely allow any suspicions 
for which there is no stable information yet, as in the “poor” output of 17. 
From Local to Global. Chandra and Toueg intended to give an abstract 
model of FDs, but we feel that by attaching individual modules to every 
process it is still too concrete, i.e., close to implementation aspects. Fur- 
thermore, the above completeness and accuracy properties are all defined 
globally on the set of all FD modules, not on individual ones. Thus, we 
rather propose to model FDs as a single global entity and have all processes 
share access to it. As a side-effect, the freedom (=” imprecision”) in the for- 
mulation of FD properties with respect to the proper choice of the “accuracy 
set” disappears. 

Summing up, we seek to model the environment of process networks dynamically 
as a global device that exclusively stores stable information. But, apart from 
crashes that occur irrevocably, which information should this precisely be? 

Looking again at the previous {F, i7)-based FDs, the principle behind the 
more complicated notions of accuracy seems to be that of “justified trust” . Cor- 
rect processes — those that, according to F, were immortal in the given run 
— are trusted forever (according to H) in the given run, either eventually or 
already from the very beginning. If, in some dynamic operational semantics sce- 
nario, we want to model the moment when such a process becomes trusted, we 
must ensure this process not to crash afterwards — it must become immortal at 
this very moment. Then, we call such a process trusted-immortal. 

4 An Operational Semantics View of the New Model 

As motivated in the previous section, we propose a new model — defined by 
its operational semantics — that can be used to represent all of the FDs of 
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Table 4. Operational Semantics Scheme with Reliable Information 



(D-env) 



(Tl U T7) n C 



0 (C U C) n TJ = 0 |C U C7| < maxfail(n) 
(T1,C) ^ (Tl W r/, C tt) C) 



(D-tau) 



(Ti,c) = r ^ r' N N' 
r N ^ r' N' 



i^C 



(Ti.c) = r 

(T-suspect) 



r' 



N 



suspect j 



r \- N ^ 



N' i^C 
r' h N' 



conditionA-(-r, j) 



[CT96] solely based on stable/reliable information that is not fixed before a 
run starts, but is dynamically appearing along its way. It turns out that two 
kinds of information suffice: (1) which processes have crashed, and (2) which 
processes have become trusted-immortal. Both kinds of information may occur 
at any moment in time, and they remain irrevocable in any continuation of the 
current run. 

We use the symbol D to recall the softer more dynamic character as opposed 
to time T just passing for some predefined crash and detection schedule. 



Modeling Stable Reliable Information. Rule (D-env) in Table 4 precisely 
models the nondeterministic appearance of crashed and trusted-immortal pro- 
cesses in full generality. Environments F = (TI.C) G 2^ x 2''” record sets Tl of 
trusted-immortal processes and sets C of crashed processes. In a single step, an 
environment may be increased by further trusted-immortal processes (GTi) and 
further crashed processes (gC). The two empty-intersection conditions on Tl 
and C assure a simple sanity property: processes shall not be crashed and 
trusted-immortal at the same time. Note that we also assume the operator l±) 
to imply the empty intersection of its operands: processes may crash or become 
trusted-immortal only once. The condition concerning maxfail(n) is obvious: we 
should not have more processes crash than permitted. The sets Tl and C may 
both be empty, which implies that the environment may also do idle steps; this is 
necessary for runs whose number of steps is greater than the number of processes, 
like in the infinite runs that we are looking at. 

Rule (D-tau) straightforwardly permits actions r@z if z ^ C. Rule (T- 
SUSPECt) requires in addition that the suspected process j is permitted to be 
suspected by F, depends on the FD accuracy that we intend to model. 



Failure Detection. In our model, trusted-immortal processes are intended to 
be never again suspected by any other process. In Table 5, we specify differ- 
ent incarnations for the rule (T-SUSPECt) that are targeted at the the various 
notions of accuracy that our FDs are intended to satisfy. 
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Table 5. Operational Semantics with Reliable Detectors 



(TI.C) = r ^ r' 
(7^/Q-suspect) — ^ 

(TI.C) = r ^ r' 

(5/Vy-SUSPECT) 

(Ti,c) = r r' 

(0-suspect) 



suspect 



iV ^ iV 


r 


T, 

T 

_L 


h N' 


N 


suspect^ 


N' 




F 


T 

_L 


h N' 


N 


suspect J- 


N' 



r \- N ^ r' \- N' 



i ^ c 



i^C 



i^C 



jec 



i ^ Tl / 0 






i^Ti 





Strong Accuracy (as in 7^/Q) can be expressed very simply and directly in our 
environment model, because it does not explicitly talk about correct processes. 
Rule (7^/Q-suspect) says precisely that “no site is suspected before it has 
crashed” by requiring that any suspected process j must be part of the set C. 
Note that the component Tl is not used at all; if we were interested in just strong 
accuracy, it would suffice to record information about crashed processes. 

Definition 3. A U){V/Q)-run is an infinite sequence {Ft h Nt)t^T 
generated by (D-env), (D-tau), and (7^/Q-SUSPECt). 

Weak Accuracy (as in S /W) builds on rule (iS/W-SUSPECt). In order to get 
that “some correct process is never suspected”, the idea is that some process 
must become trusted-immortal before any suspicion in the system may take 
place. A process i may always suspect process j unless the failure detector tells 
otherwise, i.e., unless it imposes to trust j by j ^ Tl. Note that if we allowed 
suspicions before the “election” of at least one trusted-immortal, then even a 
process becoming trusted-immortal later on might have been suspected before. 

Definition 4. A D(S/W)-run is an infinite sequence {Ft h Nt)t(=T 
generated by (D-env), (D-tau), and (5/>V-SUSPECt). 

The other versions of “eventual accuracy” cannot be expressed solely by 
operational semantics rules; additional liveness properties are required. 

Eventual Weak Accuracy (as in OiS/OW) builds on rule (0-SUSPECt), which 
is a slightly more liberal variant of (iS/W-SUSPECt): suspicions may take place 
without some process having become trusted-immortal. However, we need to add 
to the condition on runs that eventually at least one process indeed turns out 
to be trusted-immortal such it cannot be suspected afterwards. The detector f2 
of [CHT96] is very close to this very intuition, as well (confirming Q = OW). 

Definition 5. A D(05/{>yV’)-rMn is an infinite sequence {Ft h Nt)t^T 

generated by (D-env), (D-tau), and (0-suspect), 

where there is a reachable state (Tl*, C*) h with TIj 7 ^ 0. 
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Eventual Strong Accuracy (as in (}V/(}Q) is a nuance trickier: like its weak 
counterpart, it builds directly on rule (0-SUSPECt) and adds to it a condition on 
runs, but now a more restrictive one: in any run, there must be a state E = (Tl, C) 
with TIU C = P. In such a state, all decisions about correctness and crashes have 
been taken. This witnesses that ()V is called eventually perfect [CT96]: in fact, 
the condition j ^ Tl becomes equivalent to the perfect condition j G C of 7^/Q. 

Definition 6. A !]>{()V /()Q)-run is an infinite sequence {Et h Nt)t^j 

generated by (D-env), (D-tau), and (0-SUSPECt), 

where there is a reachable state (Tl*, C() h iVj with TIj- U Cf = P. 

Note that we did not explicitly mention completeness properties in our re- 
definitions. In fact, as inspired by Q (see Subsection 2.4), they are built-in im- 
plicitly. With the rules (5/>V-SUSPECt) and (0-SUSPECt), the suspicion of a 
crashed process is always allowed. With rule (7^/Q-SUSPECt), the suspicion of 
a crashed process is allowed immediately after it crashed. Note that complete- 
ness is thus provided in the strongest possible manner, strictly implying strong 
completeness. It does not only hold eventually, but “as soon as possible”, sub- 
ject only to accuracy constraints. This built-in strength is a consequence of the 
principle of our model to store only stable information to govern the possibility 
of suspicions, leaving complete freedom to suspect in all those cases where the 
fate of the suspected process has not yet been decided on in the current run. 

Having redefined runs that are allowed for particular FDs, we must of course 
also argue that they correspond to the original counterparts of Chandra and 
Toueg. This we do formally in the following section. 

5 Validation of the New Model 

We compare our D-representations of FDs with the T-representations proposed in 
[CT96] extensionally through mutual “inclusion” of their sets of runs. Essentially, 
we are looking for a mutual simulation of T-runs and D-runs sharing the same 
network run (by projecting onto the fV-component). To this aim, it will be crucial 
to formally relate the respective notions of environment. 

Definition 7. (TI,C) corresponds to {t,F,H), written (TI,C) ~ (t,F,E[), if 

1. C = F{t), and 

2. j G Tl implies that Wi £ At : \/t' > t : j ^ H{i, t'). 

where A* := F{t) or At := correct(F) depending on the respective “accuracy set” 
(see Section 2) of the version of accuracy that we are considering. 

It is also convenient to use case 2 in the opposite direction for the case of t' = t: 
If (Tl, C) ~ (t, F, FI), then Vi G A* : Vj : j G i?(i, t) implies that j ^ Tl. 

Example 2. For all F,H: (0,F(O)) ~ (0,F,H). This correspondence holds, be- 
cause C = F{0) is defined to satisfy case 1, and Tl = 0 trivially implies case 2. 
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We present the main theorems in a generic manner. Let I?® G { 7^, S, (}'P, (}S } 
and G { Q, WjOQ, OW} denote the FDs with respect to strong and weak 
completeness. We use T>^ to conveniently denote the respective variants. 
First, we offer a rather obvious observation. 

Lemma 1. Lei jV^ G {iP/Q, 5/W, O^P/OQ, 05/0W}. 

Every T{T>‘^)-run is also a T{V'^)-run. 

Proof. Trivial, because strong completeness implies weak completeness. □ 

Now, we show the main mutual simulation theorem. It also underlines the 
fact that the kind of completeness does not really matter much, which confirms 
the respective result of [CHT96], that the eight FDs collapse into just four. 

Theorem 1. LetV^/V'^ G {P/Q, S/W, OP/OQ, 05/0W}. 

1. If ((t, F, H) h is a T{'D'^)-run, 

then there is (Pt)teT such that {Ft F Nt)teT is a D(P®/P"')-ran. 

If (Ft h W)teT ts a D{VyV^)-run, 

then there are F, H such that {{t,F,H) h is a T{V^)-run. 

Note that part 1 requires only a T(P'")-run, while part 2 provides a T(P®)-run, 
which is due to the strength of the built-in completeness of the D-model. 

Proof. See Appendix A. 

We also show that 1? is “equivalent” to OiS/OW. 

Theorem 2. 

1. If ((t, F, H) h Nt)t^f is a T{fl)-run, 

then there is (Pi)tgT such that {Ft F Nt)t^f is a '0{()S / ()W)-run. 

2. If {Ft F W)teT is a B{()S /(}W)-run, 

then there are F, H such that {{t,F,H) F W)*eT is a T{I2)-run. 

This theorem allows us to denote D(05/{>yV’)-runs as D(l7)-runs, and justifies 
the model that we used when proving a Consensus algorithm correct in [NFM03]. 

We could prove Theorem 2 “directly” just like we did in the proof of Theo- 
rem 1 . However, we can also profit from the work of Chandra and Toueg, whose 
results translate into our setting as in Proposition 1 below. 

There are algorithmic FD-transformations and such that 

— for all P, iL : iL G C(P) implies Tq^ow{H) G <»V(P), and 

- for allF,H : H G <}S{F) implies T<ys^Q{H) G C(P). 

Proposition 1 ([CHT96]). Let R denote {{t,F,H) F Nt)t^Y. 

1. If R is a T{fl)-run, then {{t, F,Tq^().w{H)) F Nt)t^T is a T{()W)-run. 

2. If R is a T{(}S)-run, then ((t, P, F iVi)teT is a T{f2)-run. 

Proof (of Theorem 2). 

1. By Proposition 1(1) and Theorem 1(1). 

2. By Theorem 1(2) and Proposition 1(2). 

□ 
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6 Conclusions 

We propose a new model of FDs that we consider easier to understand, easier 

to work with, and more natural than the model used by Chandra and Toueg. 

— It is arguably easier to understand, because the environment information 
that it provides to check the conditions of the rules of Table 5 is designed to 
be minimal — shared, global, and reliably stable — and to support just those 
moments when suspicions are effectively needed with a maximal flexibility. 
This is in contrast to the {F, iJ)-based model with individual FD modules 
that at any moment in time produce unreliable output (sets of currently 
suspected processes) that their master process may not be interested in for 
a long time; they might even have crashed already. 

It is certainly easier to understand from the computational point of view due 
to the dynamic modeling of events concerning crashes and their detection. 

— It is easier to work with, because it is generally more light-weight in that 
only stable information is considered. Moreover, our model is simpler since 
the strongest possible completeness property is built-in, so we do not have 
to explicitly care about it when, e.g., looking for the weakest FD solving 
a distributed computing problem in our model. Also, to exploit any of the 
accuracy properties in proofs, it suffices to check rather simple syntactic 
conditions in states of a given run. Starting in the initial state, a finite search 
suffices, profiting from the built-in monotonicity of the stable information. 

— It is more natural, for two reasons: (1) It avoids the need to impose additional 
completeness properties by allowing dynamic nondeterminism on suspicions 
until they possibly become forbidden forever. (2) It avoids the problem of 
selecting the “accuracy set” of eventually reliable individual FDs, where the 
(F, F)-based model leaves the choice to FDs of correct, alive, or all processes. 
In our model, FD modules are not modeled individually as belonging each 
to individual processes, but failure detection is modeled by using a global 
shared entity. In a dynamic operational scenario as ours, the only reasonable 
choice for the counterpart of the “accuracy set” is the alive processes. 

In this paper, we concentrated on the FDs presented in [CT96,CHT96], but we 

see no obstacle in applying our principles of Section 3 to other {F, F)-based FDs. 
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A Proofs of Theorem 1 

Proof. For each case of T>^ /'D'” , the proof follows the same pattern. 

1. Let {{t,F,H) h Nt)ter be a T(I?'^)-run. Any step 

(t,F,H) h IVi (t+l,F,H) h Nt+i (1) 

must now be simulated by a derivable step 

Fth Nt Ft+i h A^t+i (2) 

for some (A)teT- In order to verify the derivability of transition (2), the 
semantics tells us to check the two possibilities of action r@i and suspect^- 
carried out by Nt- Naturally, knowing the derivability of transition (1), we 
can deduce some knowledge about the output of F and Ff at time t. 

We construct A using this knowledge; formally, we establish A ~ (t, F, H). 
To this aim, we prove that the correspondence is preserved under appropri- 
ately defined environment transitions, indicated by A : — A-i-i- 

Lemma 2. //A ~ and Ft > A-i-i ■ then A-i-i ~ {t+1, F, F[). 

Proof (of Lemma 2 by construction of Ft '■—* A-i-ij- 

In fact, it is never a problem to have A-i-i ~ {t+^, F, H) satisfy its first 
condition C(t-l-l) = F{t+1) by simply setting C(t-l-l) := F’(t-l-l). Note that 
since F is steadily increasing, then the condition Cl+lC of (D-env) is satisfied. 
In order to have A-i-i ~ (f+I, A P[) satisfy its second condition, we must be 
very cautious when we add elements to Tlj to become Tb+i. 
case T>'”=Q : Never add elements to Tlj+i. Then any transition A — ^ A-i-i 
is derivable, and Lemma 2 holds immediately, 
case T>'^=W : Here, we may assume weak accuracy: some correct process p 
is never suspected. We simply set Tig := {p} to get the desired effect, 
and afterwards never again change the Tb component. 

Note that with A := {{p}, F{0)), we have A ~ {0,F,H) precisely due 
to the weak accuracy assumption of the T(>V)-run. 

Note that then, as a consequence. Lemma 2 immediately holds for all 
transitions (of all t > 0) . Of course, it is also needed in these transition to 
check that p will not accidentally be chosen to crash; this is guaranteed, 
because of weak accuracy requiring a correct process, which is thus 
crashed(A) and will therefore never enter any L.f 
case T>'^=(}W : Here, we may assume eventual weak accuracy: eventually, 
after time t, some correct process p will never again be suspected, so we 
set TIq = • • • = TI£_;^ = 0 and Tl^ := {p}. 

The critical transition for Lemma 2 to hold is A-i — A- H^re, the even- 
tual weak accuracy property makes A ~ A H) satisfy condition 2. 
case T>'^=()Q : Here, we may assume eventual strong accuracy: eventually, 
after time t, no correct process p will ever again be suspected, so we set 
Tig = • • • = TI£_;^ = 0 and Tl^- := correct(F). 

The argument for Lemma 2 to hold is a replay of the previous case. □ 
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The “constructive preservation” property of Lemma 2 provides us with the 
required assumptions to simulate subsequent steps and, thus, allows us to 
iteratively simulate the whole infinite run, starting in all cases but one at 
Iq = (0,F(O)). Let us first look at individual simulation steps. 

Lemma 3. //A ~ and transition (1), then transition (2). 

Proof (of Lemma 3). Check the conditions to derive transitions (1), either 
due to (T-tau) or due to (T-SUSPECt), and then observe that the correspon- 
dence of environments also enables to derive the respective transition (2). 
If transition (1) required i ^ F(f), which it does in both (T-tau) and (T- 
SUSPECt), then the first condition on ~ provides us with the required i ^ Cpt 
in (D-tau) and in any of the (T-SUSPECt). We may then focus on the more 
interesting boxed condition of the rules (T-SUSPECt). 

case TT”=Q : The enabling conditions for (7^/Q-SUSPECt) only depend on 
the respective C and hold trivially. 

case T)'^=W : Recall that the second condition on ~ tells us that j G H{i,t) 
implies j ^ Tlj. Since, by definition of A > A+i for the case W, also 
Tit yf 0 holds for all t > 0. Together, this (j ^ Tb yf 0) implies that 
suspicion steps can always be simulated using (5/>V-SUSPECt). 
cases V'^=(}W and V'^=()Q : Again, j G H{i,f) implies j ^ Tb. This is 
already sufficient to simulate suspicion steps using (0-SUSPECt). □ 

The basic requirement on D(T>®/I?'^)-runs (matching the Definitions 3-6) is 
to consist of sequences of derivable transitions. This holds in all cases by the 
infinite iteration of Lemma 3. However, the 0-runs (i.e., the D(05/0VV’)-runs 
and D(0’P/0Q)-runs) require an additional condition, 
case T>'^=()W : The resulting run is a D(05/0W)-run, because there is t 
in which we set Tl^ := {p}. 

case T>'”=()Q : The resulting run is a D(0'P/0Q)-run, because there is t 
in which we set Tl^ := correct(F). However, this is not necessarily yet 
the moment needed to establish a D(07^/0Q)-run. In order to find this 
moment r, we only have to wait until all the processes in crashed(J^) 
have actually crashed. By our definition of Ct, the required property of 
Definition 5 that Tl^, U Cp = P becomes valid. 

2. The structure of this proof is similar to the previous one. 

Let R := {Ft h be a D(T>®/I?'")-run with A = (Tlt 7 Ct) for t gT. 

Before, we constructed a sequence of A for some fixed F, H such that Ft ~ 
{t, F, H) for all t > 0. Here, we construct the functions Fr, Hr with Hr G 
F^{Fb) by means of the information found in the various A- While it is 
obvious that Hr should very closely follow the information recorded in C, 
there is a lot of freedom in the choice of Hr — allowing suspicions or not — 
because it is supposed to contain a lot of information that is never checked 
in the projection (fVt)ieT of R. We are going to choose Hr to permit the 
maximum amount of suspicions. As a consequence, this choice gives us the 
strongest possible completeness property essentially for free. 
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Independent of the FD, we may transform any given D-run into a T-run by 
constructing in a uniform manner: 

Vt G T : FR{t) Ct 

For the construction of Hr, we need to distinguish among the cases of T>^ 
between the perfect FD {V) and the imperfect FDs {S ,()V ,()S). 

For the perfect FD, we set: 

VtGT:Vi GP:i7fi(i,f) = Q 

For the imperfect FDs, we need some auxiliary functions. 

Let R denote the run {Ft F Nt)t^f. 

Let ti{R) := UteT"*"'* denote the set of trusted-immortal processes of R. 

If j G ti(i?), then let tj be (uniquely!) defined by j ^ Tlq._i and j ^ Tlq., 
thus denoting the moment in which j becomes trusted-immortal. 

To define Hr, we start by allowing suspicions of every process by every 
other process at any time. (We may, of course, leave out useless permissions 
to self-suspect, but this does not matter for the result.) 

VtGT:VtGP:iLfi(*,t) P 

From these sets, we subtract (in imperative programming style, with V de- 
noting f orall loops) a number of processes, depending on the time at which 
trusted-immortal processes have become so. 

Vj G ti(i?) : Vz G P : Vt > t, : Hr{i, t) Hr{i, t) \ {j} 

By construction, we immediately get that T) ~ {t,FR,HR). 

The correspondence is also preserved by the T-transitions of R. 

Lemma 4. If Ft ~ {t,FR,HR) and Ft Ft+i, then Ft+i ~ {t+l,FR,HR). 
As before, we need a simulation lemma; now, it addresses the other direction. 
Lemma 5. If Ft ~ {t, Fr, Hr) and transition (2), then transition (1). 

It holds for symmetric reasons, because it is also exploiting the same corre- 
spondence properties of the constructed pair Fr, Hr. 

The only difference (in fact, a simplification) in the final iteration of the 
simulation is in the case of the D(07^/0Q)-run. Here, the moment t that 
according to Definition 6 shows Tl^ U Cj. = P is precisely the earliest moment 
that provides weak accuracy for the T(07^)-run. 

Now, the remaining argument is to show that the components {Fr, Hr) of a 
T(2?®)-run also satisfy strong completeness. In fact, by construction, we have 
the following completeness property for the case of imperfect FDs: 
in every run, 

every crashed process is always suspected by every process. 

This obviously strictly implies strong completeness. The case of perfect FDs 
is similar, just replacing the words always suspected by the words suspected 
right after it crashed, which also strictly implies strong completeness. □ 
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Abstract. We describe a new network service, the “ticket server”. This service 
provides “tickets” that a client can attach to a request for a network service 
(such as sending email or asking for a stock quote). The recipient of such a re- 
quest (such as the email recipient or the stockbroker) can use the ticker server 
to verify that the ticket is valid and that the ticket hasn’t been used before. Cli- 
ents can acquire tickets ahead of time, independently of the particular network 
service request. Clients can maintain their stock of tickets either on their own 
storage, or as a balance recorded by the ticket server. Recipients of a request 
can tell the ticket server to refund the attached ticket to the original client, thus 
incrementing the client’ s balance at the ticket server. For example, an email re- 
cipient might do this if the email wasn’t spam. This paper describes the func- 
tions of the ticket server, defines a cryptographic protocol for the ticket server’ s 
operations, and outlines an efficient implementation for the ticket server. 



1 Motivation 

Several popular network services today have the characteristic that the person using 
the service doesn’t pay for it, even though the provider of the service incurs real 
costs. In a variety of cases this largesse has led to abuse of the services. The primary 
example is email, where the result is the spam business. Other examples include sub- 
mitting URL’s to a web indexing service, or creating accounts at a free service such 
as Yahoo or Hotmail, or even in some cases reading a web site (which can be abused, 
for example, by parsing out the web site’s contents and presenting them as part of 
another site, without permission). 

To appreciate the scale and difficulty of the spam problem, consider some statis- 
tics. In February 2003 AOL reported that their spam prevention system was detecting, 
and suppressing, 780 million incoming spam emails per day for their 27 million users 
[5]. On August 4* 2003 Hotmail, with 158 million active user accounts, received 2.6 
billion emails (plus 39 million from within Hotmail). 2.1 billion (78%) of the emails 
were mechanically classified as spam, roughly 24,000 per second. That same day 
Hotmail also had more than 1 million new accounts created — most likely those were 
not all for normal users. Today, spam is extremely cheap for the sender. If you search 
at Google for “bulk email” you will find organizations willing to deliver your email 
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to one million addresses (provided by them) for a total cost of $190 (i.e., 0.019 cents 
per message). Accordingly, there is a lot of current work on techniques for deterring 
spam [4, 9, 12, 13, 15, 20]. 

This paper is about one technology for preventing such abuse, which we call the 
“ticket server”. For ease of exposition in this paper, and because spam was our pri- 
mary motivation, we will mostly describe the application of this technology to email, 
though in general it applies equally well to other network services. To read what we 
say in a more general form, replace “email” with “network service”; replace “sender” 
with “client requesting the service”; replace “recipient” or “recipient’s ISP” with 
“service provider”; and replace “email message” with “service request”. 

As was originally pointed out by Dwork & Naor [11], and subsequently used in 
several systems such as HashCash [6] and Camram [8], we can potentially reduce the 
abuse of email by forcing senders to attach to the email proof that the sender has 
performed a lengthy computation as part of the process of sending the email. 

Straightforward computation is not the only plausible cost that we might impose 
on a sender. We might use a function that’s designed to incur delays based on the 
latency of memory systems [2, 10] — this is more egalitarian towards people with 
low-powered computers. We might hope to force the sender to consult a human for 
each message by using a Turing test [3]. We might also rely on real money, and use a 
proof-of-purchase receipt as the proof. In all these cases, after the cost has been in- 
curred the sender can assemble proof that it has been incurred. In this paper we call 
any such proof a “ticket”. Our design is essentially independent of the particular form 
of cost that the ticket represents. 

In all these schemes, it is critical that the sender can’t use the same ticket for lots 
of messages. The way that this has been achieved in the past is by making the email 
message itself be involved in the creation of the proof. For example, you could re- 
quire that the computation be parameterized by the message date, recipient, and some 
abstract or digest of the message body. 

Unfortunately, this means that the sender must incur the cost after composing the 
message and before committing the message to the email delivery system. The send- 
ing human must wait for the cost to be incurred before knowing that the message has 
been sent. For example, the sender could not disconnect from the network until after 
the computation completes. In addition to causing an unfortunate delay, this synchro- 
nization limits how long a computation we can require, and thus limits the economic 
impact of the computational cost on spammers. 

The ticket server design was created to avoid this problem. By introducing a state- 
ful server, we allow the sender to acquire tickets independently of a particular email 
message. Instead, the ticket server maintains a database of tickets issued. When a 
recipient receives the email, he calls the ticket server to verify that the ticket is not 
being reused, and to update the ticket server’s database to prevent subsequent reuse. 
The operation of the ticket server is reminiscent of how postage stamps work. 

Introduction of a stateful server allow us to provide three key benefits: 

(1) Asynchrony: senders can incur the cost (for example, perform the computation) 
well before composing or sending the email (perhaps overnight). This makes the 
whole mechanism much less intrusive on users’ normal workflows, and therefore 
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more likely to be acceptable to the non-spamming user community. It also makes 
it reasonable to demand much higher computational costs, since a non-spam 
sender can incur the cost at a convenient time. 

(2) Stockpiles: since the tickets are now independent of what they’ll be applied to, 
users can maintain a stockpile of tickets for future use. They could also acquire 
such a stockpile from elsewhere (for example, bundled with their purchase of 
email software or bundled with signing a contract to use a certain ISP, or pre- 
sented to them by recipients who would welcome their email). 

(3) Refunds: having a stateful server allows us to introduce the notion of an “ac- 
count” for a user. This provides a small convenience for the sender, as a way for 
managing the user’s stockpile of tickets. But it also enables a new feature: if a re- 
cipient receives a ticket with an email from a sender, and the recipient decides 
that the email didn’t need to be paid for, then the recipient can refund the ticket 
to the sender, by telling the ticket server to do so. 

Refunds are potentially a powerful tool. If we can arrange that most tickets on non- 
spam email will be refunded, then most non-spamming users will end up having most 
of their tickets refunded, and have little need to acquire new ones. In that case it 
would be reasonable to make tickets even more expensive, and thereby make it even 
more likely that we can price the tickets in a way that will increase the sender’ s cost 
for sending spam to a level comparable to (or even exceeding) that of physical junk 
mail. We will consider the feasibility of this level of refunding in section 3. 

The remainder of the paper is as follows. We provide an overview of our design in 
section 2. In section 3 we show how the ticket server can be applied to the particular 
case of spam reduction. Section 4 outlines a secure protocol for the ticker server op- 
erations. Section 5 describes an efficient implementation of the ticket server. Section 
6 discusses the issues that would arise from trying to deploy the ticket server and 
apply it to spam reduction. Finally, section 7 discusses some related work, and we 
summarize our conclusions in section 8. 



2 Overview 

In its simplest form, the ticket server provides two operations to its customers: “Re- 
quest Ticket” and “Cancel Ticket”. A third operation, “Refund Ticket”, is also useful. 

The “Request Ticket” operation has no arguments. It returns a “ticket kit” (assum- 
ing the requestor has no account balance, see below). The ticket kit specifies one or 
more “challenges”. The challenges can take a variety of forms, such as computing a 
hard function (paying for the ticket by CPU or memory cycles), or passing a Turing 
test, or paying for the ticket by credit card (in some separate transaction not described 
here). The requestor decides which challenge to accept and does whatever it takes to 
respond to the challenge. The requestor can then take the response, and the rest of the 
ticket kit, and assemble them into a new, valid ticket. (Details of how this is achieved 
are described below). 

The requestor can use this ticket, once, in any manner he chooses (for example, he 
can transmit it as a header line in an email message). Someone receiving a ticket (for 
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example, the recipient of the email) can take the ticket and invoke the ticket server’s 
“Cancel Ticket” operation. This operation verifies that the ticket is valid, and that it 
hasn’t been cancelled before. It then records in the ticket server’s database that the 
ticket has been cancelled, and returns to the caller (for example, the recipient of the 
email) indicating that the operation succeeded. Of course, the “Cancel Ticket” opera- 
tion will return a failure response if the ticket is invalid or has previously been can- 
celled. 

Finally, whoever received the ticket and successfully performed the “Cancel 
Ticket” operation can choose to refund the ticket to the originator by invoking the 
“Refund Ticket” operation at the ticket server. This causes the ticket server to credit 
the ticket to the original requestor’s account, by incrementing the requestor’s account 
balance. The ticket server also records in its database the fact that this ticket has been 
refunded. Of course, the ticket server rejects a “Refund Ticket” request for a previ- 
ously refunded ticket. 

When a requestor whose account has a positive balance calls the “Request Ticket” 
operation, instead of a ticket kit (with a challenge) he receives a new, valid, unused 
ticket, and his account balance is decremented. 

The ticket server is designed to guarantee that tickets cannot be forged; that a 
ticket can be cancelled only if it’s valid and hasn’t previously been cancelled; and 
that a ticket can be refunded only after it has been cancelled, and only if authorized 
by the principal who cancelled the ticket, and at most once. We describe later a cryp- 
tographically protected protocol to achieve these guarantees. 

As will be seen in the detailed protocol description, the use of tickets provides 
some additional benefits. The ticket identifies the requestor by an anonymous account 
identifier created by the ticket server. In addition to the ticket itself, the requestor is 
given a ticket-specific encryption key; the requestor is free to use this key to protect 
material that is shipped with the ticket, since the same key is returned to the caller of 
the “Cancel Ticket” operation. 



3 Application to Spam Reduction 

To use the ticket server for spam reduction, an email recipient (or his ISP) arranges 
that he will see only messages that either have a valid ticket attached, or are from a 
“trusted sender”. This restriction can be implemented in the receiving ISP’s mail 
servers, or by the recipient’s email viewing program. There is a wide range of options 
for the notion of “trusted sender”, which we’II discuss later in this section. 



3.1 Basic Scenario 

The figure below shows the basic scenario for using tickets for spam prevention, in 
the absence of trusted senders. We describe later what happens if the ticket is omitted, 
and/or the sender is trusted. 
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Fig. 1. The basic scenario for using tickets for spam prevention. 

The prospective sender acquires a ticket kit by interacting with the ticket server by 
HTTP, constructs a ticket from the kit by responding to its challenge, and sends the 
ticket to the recipient with the email message by SMTP. The recipient (or the recipi- 
ent’s ISP) validates and cancels the ticket with the assistance of the ticket server by 
HTTP, and optionally uses HTTP once more to refund the ticket. It is of course pos- 
sible for a frequent sender to acquire many ticket kits in a single HTTP request, and 
for a busy ISP to verify, cancel, and optionally refund many tickets in a single HTTP 
request. 



3.2 Who Is a Trusted Sender? 

We envisage several different ways of categorizing a sender as “trusted”. Some send- 
ers should probably be trusted on a per-recipient basis. A user might reasonably de- 
cide to trust everyone in his address book. He might additionally trust everyone to 
whom he has previously sent email. There might also be a mechanism for a user to 
add recipients to a “safe list” exempted from spam prevention. (The safe list would be 
stored by the ISP if the ISP was responsible for checking tickets, or by the user’s 
email viewing program otherwise.) More broadly a user or his ISP might decide that 
some groups of senders are to be trusted. For example, if the ISP has a strong (and 
enforced!) policy of canceling spammers’ accounts, then email from within the same 
ISP should be trusted. It might also be wise to trust senders from other ISP’s with 
similarly acceptable anti- spam policies. 

There is of course an issue about trusting senders, given the fact that most SMTP 
transactions do not authenticate the asserted sender name. The ISP can no doubt as- 
sure itself that asserted senders within the ISP’s own domain name space occur only 
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on messages originating within the ISP, and that those are authentic. However, there 
is always some suspicion about unauthenticated sender names arriving from outside. 
There are reasonable part-way measures available, though. For example, an ISP could 
accept as authentic a sender in “aol.com” if and only if the SMTP server transmitting 
the message is owned by AOL, as proved by reverse IP address lookup in DNS (and 
similarly for Hotmail, Yahoo, and numerous others). Overall we believe that with 
care it is possible to have a reasonable degree of confidence in the authenticity of a 
very large proportion of sender names. There are other groups currently exploring 
ideas for stronger authentication of sender names in email (see section 7.3). 



3.3 Variations of the Basic Scenario 

We can now explain what happens in all the variations of the basic scenario: mes- 
sages with or without tickets, from trusted or untmsted senders. In the following, the 
actions described as being performed by the recipient’s ISP could equally well be 
performed by the recipient’s email program — either form of deployment would 
work. 

Variation 1: Trusted Sender, but No Ticket 

When a message arrives with no ticket attached, but from a trusted sender, the mes- 
sage appears in the recipient’s inbox in the usual way. The ticker server is not in- 
volved at all. 

Variation 2: Trusted Sender with a Ticket Attached 

When a message arrives with a ticket attached from a trusted sender, it appears in the 
recipient’s inbox in the usual way. Additionally, the ticket server is told that it should 
refund the ticket, crediting the sender’s account. This step can be taken automatically 
by the ISP, and requires no explicit action from the recipient. In this case the “Cancel 
Ticket” and “Refund Ticket” operations at the ticket server can be combined for effi- 
ciency. 

Variation 3: Untrusted Sender with a Ticket Attached 

When a message arrives with a ticket attached from an untmsted sender, the recipi- 
ent’s ISP calls the ticket server’s “Cancel Ticket” operation, to verify and cancel the 
ticket. If the ticket is invalid or previously cancelled, the message is silently dis- 
carded. Otherwise, it appears in the recipient’s inbox. When the recipient sees the 
message, if the recipient decides that the sender should indeed pay for the message, 
he need do nothing more. However, if the recipient decides that this message wasn’t 
spam, the recipient can choose to call the ticket server’s “Refund Ticket” operation, 
to refund the ticket’s value to the sender. (Note that there is an interesting human- 
factors decision to be made here: should “Refund Ticket” require an explicit action 
from the user, or should it be the default, which can be overridden by the user classi- 
fying the message as spam?) 
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Variation 4: Untrusted Sender and No Ticket 

When a message arrives without a ticket attached and from an untmsted sender, the 
ISP might choose to respond in one of two ways. First, the ISP might treat the mes- 
sage as suspicious, and flag it but nevertheless deliver it to the recipient. 

Alternatively, the ISP could hold the message (but invisibly to the recipient) and 
send a bounce email to the sender. The bounce email would offer the sender two 
choices: he can provide some previously acquired ticket, or he can acquire a new 
ticket by interacting with the ticket server. 

In the case where the sender chooses to use a previously acquired ticket, he simply 
provides it to the ISP by passing it over HTTP to the ISP (perhaps through an HTML 
form provided as part of the bounce message). On receipt of this, the ISP calls the 
“Cancel Ticket” operation to verify and cancel the ticket, and provided this succeeds, 
makes the message available to the recipient’s inbox. 

Alternatively, if the sender wants to acquire a new ticket at this time, he must call 
the ticket server. To simplify doing so, the bounce email contains a link (URL) to the 
ticket server. Clicking on the link performs a “Request Ticket” operation at the ticket 
server. The result appears to the sender as a web page describing the available chal- 
lenges. For example, for the computational challenge the web page will contain a link 
that would cause the sender to perform the computation (via an ActiveX control or a 
Java applet, perhaps). As another example, the web page resulting from the “Request 
Ticket” operation could include a Turing test such as those used by the Captcha sys- 
tem [3]. In either case, the result of the challenge is combined with the ticket kit data 
(also on the web page), and the resulting ticket is passed via HTTP to the recipient’s 
ISP. The ISP now proceeds as if the message had originally arrived with the ticket 
attached, verifying and canceling the ticket and delivering the message to the recipi- 
ent. 

If a message remains in the “held” state too long without the sender responding to 
the bounce message, it is silently discarded by the ISP. The same happens if the 
sender failed to provide an appropriate return address, or if the sender responds to the 
bounce message with an invalid ticket. 

4 Protocol 

This section describes a specific protocol for the ticket server’s operations. The de- 
scription is fairly abstract, and deliberately leaves a lot of flexibility: in how the data 
is represented and transported (for example, through HTML pages and HTTP in the 
spam scenario described above), in how the cryptography is implemented, and in 
some semantic choices (such as whether to encrypt data in transit or just protect it 
with a message authentication code or MAC [17]). While these are all important 
design decisions for an actual implementation, they are largely irrelevant to the over- 
all ticket server concept. We do describe how to achieve appropriately transactional 
semantics in the presence of communication failures and retransmissions. 

The following datatypes and messages implement the ticket server’s three opera- 
tions. “Request Ticket” is message (1) and its response (2); “Cancel Ticket” is mes- 
sage (4) and its response (5); “Refund Ticket” is message (6) and its response (7). 
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The participants in the protocol are the ticket server itself, client “A” who requests a 
ticket, and client “B” who receives the ticket from A and uses it in messages (4) 
through (7). In the case of email spam prevention, A is the sender (or perhaps the 
sender’s ISP) and B is the recipient’s ISP (or perhaps the recipient mail user-agent 
software). 

The functions and values involved are as follows. See section 5.3 for further dis- 
cussion of how the items related to the challenge (P, X, F, and C) are represented. 

• S is a unique identifier for a ticket (in practice, a sequence number issued by the 
ticket server). 

• K.J., K^, Kg, and are secret keys (for ticket T, for A, for B, and for the ticket 
server). 

• identifies A (and K^) to the ticket server. 

• Ig identifies B (and Kg) to the ticket server. 

• TransID^ is an identifier chosen by A to identify a “Request Ticket” transaction. 

• TransIDg is an identifier chosen by B to identify a “Use Ticket” transaction. 

• H(D) is a secure hash of some data D (such as might be obtained by applying the 
SHAl algorithm); H(K, D) is a keyed secure hash of D using a key K. 

• K(D) uses key K to protect some data D in transit. This might provide a secure 
MAC for D, and/or it might encrypt D using K, and/or it might prove the timeli- 
ness of D by including a secured real-time clock value. 

• P is a Boolean predicate. It occurs in a ticket kit (see “TK”, below), where it speci- 
fies a particular, ticket-specific, challenge. It will be represented by an integer or a 
URL. 

• X is the answer to a challenge. It will be represented by an integer or short string. 
If a particular ticket kit contains predicate P, and if P(X), then X is an appropriate 
value for constructing a valid ticket from the ticket kit. In other words, the chal- 
lenge for A is to find an X such that P(X) is true. 

• F is a function that the ticket server will use in verifying that a ticket has a correct 
value of X. It will be represented by an integer. Note that F is visible to the ticket 
requestor. 

• C is a secret that assists the ticket server in verifying a ticket’s X value. It will be 
represented by an integer or short string. For any valid ticket, F(X) = C. 

• M is a message, or other request for a service that might require a ticket. 

The ticket server maintains the following state, in stable storage: 

• Its own secret key, K^,. 

• The largest value of S that it has ever issued. 

• State(S) = “Issued” or “Cancelled”, for each ticket S that has been issued (subject 
to a maximum ticket lifetime, not specified in this paper). 

• Balance(I^) = an integer, the account balance for A’s account. 

• Result(TransID^) = result of the most recent “Request Ticket” operation that con- 
tained TransIDg (maintained by the ticket server only for recently requested tick- 
ets, then discarded). 
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• Canceller(S) = TransIDg, the identifier used by B in a recent “Use Ticket” request 
for ticket S (maintained by the ticket server only for recently used tickets, then dis- 
carded). 

• Refunded(S) = a Boolean indicating whether the ticket S has been refunded to an 
account (subject to a maximum ticket lifetime, not specified in this paper). 

The following are trivially derived from other values: 

• TK = ( S, P, F, 1^, H(K^, (“Hash for T”, S, F, C, 1^) ) ), a ticket kit. 

• T = ( S, X, F, 1^, H(K^, (“Hash for T”, S, F, C, 1^) ) ), a ticket issued to A. 

• T.S is the S used in forming T; similarly for the other components of T, and for 
components of TK. 

• Kj = H(Kj, (“Hash for KT”, T.S) ), the requestor’s secret key for T. 

• Kj' = H(Kj, (“Hash for KT prime”, T.S) ), the canceller’s secret key for T. 

• K^ = H(Kg, (“Hash for KA”, I^) ), the secret key identified by I^. 

• Kg = H(Kg, (“Hash for KB”, Ig) ), the secret key identified by Ig. 

A ticket T is “valid” if and only if T.S has been issued by the ticket server, and H(Kg, 
(T.S, T.F, Y, T.I^)) = where H.j, is the keyed hash in T and Y = T.F(T.X). Note 
that this definition includes valid but cancelled tickets (for ease of exposition). 

The ticket server creates each ticket kit TK in such a way that the ticket con- 
structed by replacing TK.P with X, for any X for which TK.P(X) is true, will be a 
valid ticket. Thus, A should find an X such that P(X) is true. This arrangement turns 
out to be highly flexible, and usable for a wide variety of challenges. Moreover, this 
arrangement has the property that the ticket server does not need to compute X — it 
only verifies it. This property is important in the case where computing X is hard, for 
example when doing so is expensive in CPU time or memory cycles. See the com- 
mentary section for discussion and some examples. 

When client A wishes to acquire a new ticket, it chooses a new TransIDg, and calls 
the ticket server: 

(1) A ^ ticket server: I^, K^( “Request”, TransIDg ) 

The ticket server uses to compute K^, and verifies the integrity and timeliness of 
the message (or else it discards the request with no further action). Now, there are 
three possibilities: 

• If Result(TransID^) is already defined, then it is left unaltered. 

• If Balance(I^) > 0, then it is decremented and Result(TransID^) is set to a new 
valid ticket T such that State(T.S) = “Issued”. Note that in this case the sender does 
not need to deal with a challenge. 

• If Balance(I^) = 0, then Result(TransID^) is set to a ticket kit TK for a new valid 
ticket, such that State(TK.S) = “Issued”. Note that the ticket server does not com- 
pute the response to the challenge implicit in TK: it’s up to A to do that. 

The ticket server computes K.j. from TK.S or T.S, and sends it and Result(TransID^) 
to A: 

(2) ticket server A: K^( “OK”, Result(TransID^), K.p, TransIDg) 
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A verifies the integrity of this message (or else discards it). Note that if message (2) is 
lost or corrupted, A can retransmit (1), causing the ticket server to retransmit an iden- 
tical copy of (2). If the result is a ticket kit TK, not a complete ticket, then A can take 
TK, solve the challenge by determining some X such that P(X), and assemble the 
complete ticket T from the elements of TK. 

When A wants to use T to send B the message M (or other request for service), A 
sends: 

(3) A ^ B: T, K^(M) 

Note that B does not yet know Kj. B now asks the ticket server to change the state of 
T.S to “Cancelled”. B chooses a new TransIDg, and sends: 

(4) B ticket server: Ig, Kg( “Cancel”, T, TransIDg ). 

The ticket server verifies the integrity of this message (or discards it). If T is not 
“valid” (as defined above), or if State(T.S) = “Cancelled” and Canceller(T.S) is not 
TransIDg, then the result of this call is “Error”. Otherwise, the ticket server sets the 
state of T.S to “Cancelled”, sets Canceller(T.S) to TransIDg, computes K.g, and sends 
it back to B : 

(5) Ticket server ^ B: Kg( ( “Error” | (“OK”, K.^' ) ), TransIDg ). 

B verifies the integrity of this message (or else discards it). Note that if message (5) is 
lost or corrupted, B can retransmit (4), causing the ticket server to retransmit an iden- 
tical copy of (5). B can now use K.j. to verify the integrity of (3), and to extract M 
from it if it was in fact encrypted with K.p. The key K.j.' will be used to authenticate B 
if B decides to refund the ticket. 

In the spam-prevention application, when B is the recipient’s ISP, the ISP will now 
proceed to make the email visible to the recipient. If the recipient decides that M 
should be accepted without payment, then the recipient (or the recipient’s ISP) tells 
the ticket server to recycle this ticket: 

(6) B ticket server: T.S, K.g'( “Refund”, T ). 

The ticket server verifies the integrity of this message (or discards it). Note that in 
doing so it computes K.p' from T.S, so the verification will succeed only if B truly 
knew Kj'. If Refunded(T.S) is false, the ticket server sets Refunded(T.S) and incre- 
ments Balance(T.I^). Regardless of the previous value of Refunded(T.S), the ticket 
server then reports completion of the operation: 

(7) Ticket server ^ B: T.S, K.g'( “OK” ). 

Some aspects of the protocol call for further explanation: 

• The transaction ID in step (1) allows for the error case where the response (2) is 
lost. Because the ticket server retains Result(TransID^), A can retransmit the re- 
quest and get the ticket that he’s paid for (in the case where A’s account was dec- 
remented) without needing to pay again. The key K^ authenticates A, so that no- 
body else can acquire tickets charged to A’s account. 
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• Since is embedded in T and in the secure hash inside T, it authenticates the 
ticket as having been issued to A (or more precisely, to a principal that knows 
and K^). 

• The key returned to A in step (1) allows A to construct Kj(M); in other words, 
it authorizes A to use T for a message of A’s choosing. 

• When B receives (3), B does not know Kj so B cannot reuse T for its own pur- 
poses. B acquires K.j. only after canceling the ticket. 

• The transaction ID used in request (4) allows B to retransmit this request if the 
response is lost. Because the ticket server has recorded Canceller(T.S), it can de- 
tect the retransmission and return a successful outcome even though the ticket has 
already been cancelled in this case. 

• If the result (7) is lost, B can retransmit the request. Because of the ticket server’s 
“Refunded” data structure, this will not cause any extra increment of A’s account 
balance. 

• The use of K.^' in messages (6) and (7) authenticate the principal authorized to re- 
use T. Only the principal that cancelled T can credit A’s account. It’s fine for A to 
do this himself, but if he does so he can’t use T for any other purpose - it’s been 
cancelled. 



5 Implementation 

We have implemented the ticket server, and applied it in a prototype spam-prevention 
system. In this section we outline the techniques that we used, which provides a sim- 
ple, robust, and highly efficient server. At the end of this section we discuss the per- 
formance of this design. 

5.1 Ticket State 

The ticket server uses sequential serial numbers when creating the field S for a new 
ticket. The ticket server maintains in stable storage an integer which indicates the 
largest serial number that it has used in any ticket. It’s easy to implement this effi- 
ciently, if you’re willing to lose a few serial numbers when the ticket server crashes. 
The ticket server maintains in volatile memory the largest serial number that has 
actually been used and in stable storage a number always larger than this 

(^bound)- When enough serial numbers have been used that is approaching 
the ticket server rewrites with the new value where K is chosen to 

make the number of stable storage writes for this process insignificant (K = 1,000,000 
would suffice). If the ticket server crashes, up to K serial numbers might be lost. In 
no case will serial numbers be reused. 

The ticket server maintains 2 bits of state for each ticket S (with S < S^^g^j). One bit 
is set iff the ticket has been cancelled, and the other is set iff the ticket has been re- 
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funded. The “truth” for these bits necessarily resides on disk, and the server must read 
it from there after any cold restart. However, during normal running the ticket server 
can maintain this in volatile memory with an array indexed by ticket serial number. 

When a ticket changes state (from “issued” to “cancelled” or from “cancelled” to 
“refunded”) the ticket server updates this array, and synchronously with the operation 
records the transition in stable storage. All it needs to record is the ticket serial num- 
ber, and the new state. 



5.2 Ticket Request and Cancellation Logs 

It is possible that the ticket server will crash in the vicinity of responding to a “Re- 
quest Ticket” operation. However, the worst that this does to A is to cause him to lose 
a single ticket, which we believe is acceptable. We saw no need, in this application, to 
keep a log in stable storage that would allow A to replay his request after the server 
restarts. 

The ticket server does need to maintain a volatile data structure containing Re- 
sult(TransID^). However, this is only relevant for as long as it might receive a trans- 
mission of the corresponding “Request Ticket” operation. The ticket server can apply 
a relatively short limit to this time interval, to keep the data structure small. 



5.3 Representing the Challenge 

The protocol values P, X, F, and C are used to represent the challenge and the proof 
that A has responded appropriately, and to assist the ticket server in verifying this 
response. It’s a somewhat complex mechanism, because it’s designed to deal with a 
variety of challenge styles (computational, Turing test, perhaps others). Here are 
some examples of how these values can be used. 

In all cases F is represented by a small integer, indicating to the ticket server what 
it should do to verify that a ticket has an appropriate value of X, i.e., that F(X) = C. 

For a computational test, P describes the test. It can be represented by a small inte- 
ger Pj and a parameter P^^. The small integer selects a function Pj to be computed, 
from a list known to the participants in the protocol. The challenge for A is to find a 
value X such that Pf(P„, X) = 0. For this case, C is always 0 and the ticket server 
implements F(X) by computing Pf(P„, X). 

For a Turing test that requires recognizing a word from a distorted image, P speci- 
fies the image. We have not implemented this case, but it seems straightforward. We 
could use the image itself as P, but more likely we would use a URL at which the 
image resides. The challenge for A is to find a string X such that X is embedded in 
the image. For this case C is the correct string, and the ticket server implements F(X) 
by comparing the strings X and C. 
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5.4 Performance 

Although we have not fully tuned our implementation, we believe that this ticket 
server design, implemented on a single inexpensive PC, could handle as many ticket 
operations as could be accommodated on a 100 Mb/sec network connection. 

• In terms of memory usage, this implementation uses approximately 2 bits of main 
memory for each outstanding uncancelled ticket. So each GByte of DRAM suf- 
fices to keep the state for 4 billion tickets, roughly one week’s email traffic at 
Hotmail (after we ignore the messages that Hotmail already categorizes as spam, 
and before allowing for the fact that email messages from trusted senders don’t 
need tickets). 

• In terms of disk performance, each operation (Request, Cancel, and Refund) re- 
quires a disk write. However, this doesn’t need to overload the disk channel. We 
use a simple “group commit” design to keep the delays caused by the disk trans- 
fers negligible. To commit one of the operations we need to write only 16 bytes to 
disk: 8 bytes for S and 8 bytes for either ID^ (for Request or Refund) or TransIDg 
(for Cancel). If the operation waits until any previous disk request has completed, 
and meanwhile we accumulate the transaction records for multiple operations into 
a 4 KByte buffer, we can record 256 operations with a single 4 KByte disk write. 
Writing 4 KBytes takes about 16 milliseconds even on an ATA disk, allowing us 
to record about 16,000 ticket transactions per second on a single disk. So while 
each individual operation incurs the latency of a 4 KByte disk request, the overall 
throughput of the ticket server is not limited by the disk. 

• In terms of CPU performance, our current implementation is still unsatisfactorily 
slow. Most of the time is spent in the overheads of communicating through our 
TCP stack, which has not yet been optimized for this application. At the moment, 
the protocol and its cryptography are not significant in limiting the server’s per- 
formance. 

Of course, full deployment of this design would require multiple ticket servers, and 
arranging cooperation between them is a significant design problem. We discuss this, 
briefly, in the next section. 



6 Deployment Issues 

If everyone’s email system used the ticket server, it seems highly likely that we 
would have solved the spam problem. We can tune the effective cost of sending email 
so as to increase it, in real dollar terms, to a level comparable to physical mail: a few 
cents per message instead of 0.019 cents. This would clearly cause the demise of the 
more absurd sorts of spam, since the economic model would no longer support mail- 
ings with extremely low response rates. As with physical mail, a small level of junk 
traffic would necessarily remain. However, there are several problems with the ticket 
server idea, and any of them might prevent it succeeding. We address them next. 
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6.1 User Acceptance 

Certainly any scheme based on making the sender incur costs will make email mar- 
ginally less pleasant for normal users. The question here is how large that margin is. 
We believe it’s small. Two aspects of the design encourage us in this belief. First, 
most senders will fall into the category of trusted senders for almost all of their re- 
cipients: in general we send email to people who know us. We’ll be in their address 
book, in the same organization as them, or they’ll accept our email as being non-spam 
and they’ll refund our ticket. The net effect is that most users will actually consume 
very few tickets. If we start everyone out with a moderate stockpile (for example, 
1000 tickets bundled with a software purchase), they most likely will never be incon- 
venienced by needing to perform a lengthy computation. 

There is a separate issue about users of low-powered devices: PDA’s, phones, and 
old computers. One possibility we have already mentioned is use of memory-bound 
functions instead of CPU-bound computations. Since memory speed is relatively 
uniform across devices, there is not a lot of disparity in the cost of tickets. Addition- 
ally, email access from PDA’s and phones is always through an intermediary with 
whom the user has a service contract. It would be perfectly reasonable for that inter- 
mediary to provide a moderate number of tickets to each user as part of that contract. 

The most intrusive part of the scheme occurs in a transition period, where some re- 
cipients are requiring use of the scheme and some senders have not yet adopted it. 
This will produce irritating bounce messages demanding tickets. This is a problem of 
“you can’t get there from here’’, and might well be a fatal flaw (although it’s one 
shared by virtually all other anti-spam technologies apart from ever-smarter filtering 
rules). The most plausible way to avoid the flaw would be to get the scheme distrib- 
uted, but disabled, in a wide range of email software, and once it’s widely deployed 
then starting to actually use it. We don’t know if this is feasible. 



6.2 Trust 

The scheme depends on everyone having some level of trust in the ticket server. 
While it’s easy to see that, for example, a Hotmail user would be willing to trust a 
ticket server run by Microsoft, it’s much less clear why anyone else should. So while 
an organization such as Hotmail could unilaterally start requiring tickets, it is less 
likely that any one organization could provide a universal ticket service (nor would 
they want to). 

Equally, if several organizations (Hotmail, AOL, Yahoo, etc.) independently 
started using ticket servers, senders would be distressed at the complexity of acquir- 
ing all the appropriate forms of ticket. It would be like the postal system without the 
international agreements on mutual acceptance of foreign postage stamps. 

So it is more likely that a set of ticket servers would be run cooperatively several 
organizations, with contractual agreements about accepting each other’s tickets. This 
is less unlikely than it might seem: there are some extremely large, but highly com- 
petitive, players in the email business (AOL, Hotmail, Yahoo, and MSN). A system 
run jointly by them would have a reasonably appearance of neutrality (at least within 
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the U.S.), good credibility, and would automatically be trusted by a very large com- 
munity of users. There would be many technical (and business) issues in how such a 
federation would work, and we do not explore them in this paper. 



6.3 Reliability 

Introduction of the ticket server adds a new failure mode to the email system. No 
doubt we could build a ticket server that had negligibly few programming bugs, but 
of course that isn’t enough. Certainly we would need multiple servers, for tolerance 
of physical and environmental failures, and to handle the total load. Some such 
replication would come from a federation created to handle the trust issues, but 
probably more would be needed. 

A more worrisome problem is that the ticket server would be an appealing target 
for a denial-of- service attack. But the large email services are already that appealing. 
It’ s not terribly hard to engineer the ticket server to be able to handle anything that its 
network connection can handle, and if we assume that the servers are co-resident with 
existing large data centers such as those of Hotmail and AOL, then the problem re- 
duces to the (admittedly unsolved) one of protecting those data centers. 

Finally, note that the ticket server need not be essential, and the consequences of a 
successful attack need not be dire. If a sender cannot contact the ticket server, and his 
local stockpile of tickets is empty, he can send the email anyway and just hope that he 
doesn’t get a “ticket needed’’ bounce message. If a recipient cannot contact the ticket 
server, he can just read the email regardless. 



6.4 Cheating 

The ticket server protocol is designed to prevent obvious attacks, like forging or re- 
using tickets. There is one behavior pattern that might be considered at least “mis- 
use”: ticket farming. A spammer might, for example, acquire tickets by running a 
popular web site (such as a gamer site or a porn site) and requiring that customers 
perform the challenge part of ticket construction before being rewarded by the web 
site (for example, being allowed to play or to see a picture). This isn’t really a viola- 
tion of the protocol. We make the sender incur a cost, but fundamentally there’s no 
way we can prevent the sender delegating the required work. A similar criticism ap- 
plies to using Turing tests: there are, unfortunately, parts of the world where labor is 
so cheap as to make the cost of the tests exceedingly small. 



7 Related Work 

There are several other systems aimed at causing bulk emailers to incur costs when 
sending their email. The earliest proposal we know is that of Dwork and Naor [11], 
which we used as one of the starting points for the current work. Actual deployed 
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systems of this sort include Back’s HashCash system [6]. The Camram system [8] 
adds mechanisms that in effect automate the accumulation of a safe list. None of 
these systems provide the benefits of the ticket server that we outlined in section 1 of 
this paper. 

The ticket server is reminiscent of several other classes of system, although in ag- 
gregate it has distinctly different properties than any of them. There are authentication 
systems that work by issuing “tickets”; there are numerous systems aimed at offering 
and enforcing small-scale payments (“micro-payments”); there are other systems for 
incurring cost when sending email; and there are other systems intended to reduce 
spam. We consider here a few examples from those classes of system. 



7.1 Authentication & Authorization Systems 

Systems such as Kerberos [16] produce “tickets” as the result of an authentication. 
These tickets, like those from the ticket server, enable the clients to perform some 
operations (such as accessing a file system or a network service). However, the func- 
tionality of the tickets is quite different: there is no notion of tickets being consumed 
or cancelled, nor being refunded. The only way in which tickets become invalid is by 
expiry after a timeout (or conceivably through a revocation mechanism). 

Note that the clients of the ticket server, despite being identified by an ID, can be 
anonymous. We verify that a client is the same client as one who made a previous 
request, but we have no knowledge whatsoever of the client’s identity. Clients can 
casually acquire or discard identities. 

Our tickets are also reminiscent of capability systems such as Amoeba [19], except 
that they are explicitly single-use and the ticket server enforces this. We are unaware 
of any capability system that provides such a feature. Note that the enforcement of 
single-use necessarily produces a protocol structure involving more communication 
with the ticket server (just as a revocation mechanism does in capability systems). 

It is conceivable that one could extend Kerberos or Amoeba to make them into 
tools for preventing spam and similar abuses, much like the ticket server, but the 
extensions would be quite substantial, and it seems preferable to start from scratch 
rather than attempt to build a more complex system on the prior substrates. 



7.2 Micro-payment Systems 

Micro-payment systems come closest in functionality to the ticket server. They 
achieve very similar goals: the purchaser incurs a cost. It might conceivably be feasi- 
ble to take an existing micro-payment system and rework it to use some form of vir- 
tual currency instead of real cash. But again, we believe that we will produce a more 
satisfactory system by building it from scratch with the present applications in mind. 
The differences are substantial, as we show next. 

One such system was Millicent [14], which is a convenient example (partly be- 
cause the authors are familiar with it, but primarily because it was developed into a 
real, deployed system with transactions involving real cash). In Millicent, as in the 
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ticket server, there is protection against double spending. However, to avoid exces- 
sive interactions with an online authority, that's done by making currency (“scrip” in 
Millicent terminology) vendor-specific. For an application such as spam prevention, 
this would correspond to having scrip specific to the recipient’s ISP. The ticket 
server, by providing a recipient-independent online verification system, avoids this. 
Also, the simplicity of the ticket server protocols and data structures (and the fact that 
an ISP can batch its requests to the ticket server) makes using the ticket server as an 
online authority acceptably efficient for our applications. 

There are numerous other issues that spring to mind when considering the use of a 
straight micro-commerce system. For example, sending email from your email ac- 
count at your employer to your personal account at home would in effect steal money 
from your employer. 

7.3 Email Systems with Stronger Semantics 

There are many email enhancements that provide stronger semantics than basic 
RFC822 and SMTP email. For example, one can use PGP, S/MIME, and many other 
elaborate systems to provide features such as certified, authenticated, or secure email. 
In particular, properly authenticated email would most likely be a powerful tool for 
reducing spam, since recipients would at least be able to filter out email from repeat 
offenders or groups. But authentication is not by itself a solution: knowing the author 
of an email message isn’t sufficient to classify the message as being spam or not. 
Authentication does not address the basic problem, namely that the sender incurs too 
small a cost when sending email, with the result that vast quantities of worthless 
email flood our inboxes. (See [1, 7, 21, 22] for some design possibilities in this area, 
and for numerous further references.) 



8 Conclusion 

The ticket server is a new tool that we can use to control or limit the use of otherwise 
free and open services, such as email. By adding a shared, stateful server to the earlier 
proposals, we get significant benefits. In particular, the cost to be incurred for a ser- 
vice can be independent and prior to any particular requests. Moreover, clients may 
have account balances and may receive refunds. Therefore, we can escalate the cost 
of the service to a level where we can be certain that it will deter excessive use. 

We have designed a protocol that will allow the ticket server to guarantee the cor- 
rect functioning of the tickets, and prevent or make impractical any attempts to cheat 
by forging, stealing, or otherwise maltreating tickets. We have also designed and 
implemented a concrete mechanism to invoke this protocol with a combination of 
email messages and HTTP requests. Furthermore, we have outlined how the ticket 
server can be implemented simply and cheaply. 

There remain numerous questions about the ticket server. As discussed earlier, de- 
ployment of the service raises several tricky issues. Many of the same issues arise 
with other schemes for email payment: in all cases, deployment is difficult, involving 
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fundamental and disruptive changes to the way that Internet email works. It’s not at 
all clear that we can achieve such changes. On the other hand, removing spam would 
be a significant financial benefit to the major email service providers, and a conven- 
ience for all of us. 
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Abstract. Predicate detection is an important problem in distributed 
systems. Based on the temporal interactions of intervals, there exists a 
rich class of modalities under which global predicates can be specihed. 
For a conjunctive predicate </>, we show how to detect the traditional 
Possibly{(j)) and Definitely{(f>) modalities along with the added infor- 
mation of the exact interaction type between each pair of intervals (one 
interval at each process). The polynomial time, space, and message com- 
plexities of the proposed on-line detection algorithms to detect Possibly 
and Definitely in terms of the fine-grained interaction types per pair of 
processes, are the same as those of the earlier on-line algorithms that 
can detect only whether the Possibly and Definitely modalities hold. 



1 Introduction 

Predicate detection in a distributed system is useful in many contexts such as 
monitoring, synchronization and coordination, debugging, and industrial process 
control [2,4,6,7,8,14,16,17]. Marzullo et al. defined two modalities under which 
predicates can hold for a distributed execution [4,14]. 

— Possibly{(f>): There exists a consistent observation of the execution such that 
4> holds in a global state of the observation. 

— Definitely{(j)): For every consistent observation of the execution, there exists 
a global state of it in which cj) holds. 

The formalism and axiom system given in [9] identified an orthogonal set 5ft of 
40 fine-grained temporal interactions between a pair of intervals in a distributed 
execution. It was shown in [10] that this formalism provides much more expres- 
sive power than the Possibly and Definitely modalities, and a mapping from 5ft 
to the Possibly and Definitely modalities was given. A conjunctive predicate 
is of the form where (fi is any predicate defined on variables local to pro- 

cess Pi- We show that for a conjunctive predicate (e.g., Xi = 2 A yj > 8), 
Possibly{4>) and Definitely{(j)) can be detected along with the added informa- 
tion of the exact interaction type between each pair of intervals, one interval at 
each process. This provides flexibility and power to monitor, synchronize, and 
control distributed executions. The time, space, and message complexities of the 
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Table 1. Comparison of space, message and time complexities, n — number of pro- 
cesses, M = maximum queue length at Pq, p = maximum number of intervals occurring 
at any process, nia = total number of messages exchanged between all the processes. 
Note: p > M, as all the intervals may not be sent to Pq. 





Avg. time comp- 
lexity at Pq 


Total number 
of messages 


Space at Pq (= 
total msg. space) 


Avg. space at 
Pi, i € [1, n] 


GW94 [6] 
(Possibly) 


0{n‘‘M) or 0{nms) 


OirUa) 


0(n‘‘M) or O(nms) 


0(n) 


GW96 [7] 
(Definitely) 


0{n"‘M) or Ofnma) 


0{ma) 


Ofn^M) or Oinma) 


0{n) 


Fine^Poss, 

Fine-Def, 

Fine-Rel 


0{n‘‘M) or 
0{n[min{4ms,np)]) 


0{min{4ma,np)) 


0(mm[(4n — 2)np, 
lOnms]) 


0(n) 



proposed on-line, centralized detection algorithms (Algorithms FinC-Poss and 
Fine-Def - the main results) to detect Possibly and Definitely in terms of the 
fine-grained modalities per pair of processes, are the same as those of the earlier 
on-line, centralized algorithms [6,7] that can detect only whether the Possibly 
and Definitely modalities hold. Table 1 compares the complexities. Fine-Rel, 
which is an intermediate problem we need solve, is introduced later. 

The power of our approach stems from the use of intervals as opposed to 
individual events in the distributed execution. The intervals at each process are 
identified to be the durations during which the local predicate is true [10,12]. 
We now state Problems Fine^Poss and Fine-Def. 

Problem Fine^Poss Statement. For a conjunctive predicate </>, determine on- 
line if Possiblyiyf) is true. If true, identify the fine-grained pairwise interac- 
tion between each pair of processes when Possibly((j)) first becomes true. 
Problem Fine-Def Statement. For a conjunctive predicate </>, determine on- 
line if Definitelyiyj)) is true. If true, identify the fine-grained pairwise inter- 
action between each pair of processes when Definitely{(j)) first becomes true. 

Section 2 gives the background and objectives. Section 3 presents the frame- 
work and data structures. Section 4 and Section 5 present the on-line algorithms. 
Section 6 gives the conclusions. 

2 System Model, Background, and Objectives 

2.1 System Model 

We assume an asynchronous distributed system in which n processes commu- 
nicate only by reliable message passing. We do not assume FIFO channels. To 
model the system execution, let ^ be an irrefiexive partial ordering representing 
the causality relation on the event set E. E is partitioned into local executions at 
each process. Let N denote the set of all processes. Each Ei is a linearly ordered 
set of events executed by process Pi. An event e at Pi is denoted e^. The causality 





Global Predicate Detection under Fine-Grained Modalities 



93 



Table 2. Dependent relations for interactions between intervals are given in the first 
two columns [9]. Tests for the relations are given in the third column [10]. 



Relation r 


Expression for r{X, Y) 


Test for r(X, Y) 


R1 


\fx G XVy £Y,x < y 


Vfi[x] > Vfi[x] 


R2 


\fx G X3y £Y,x < y 


Vfi [x] > Vfi [x] 


R3 


3x G X\/y £Y,x < y 


Vy~ [x] > Vp [x] 


R4 


3x G X3y £Y,x < y 


Vfi [x] > Vp [x] 


SI 


3x G X\fy £ Y,x y l\y X 


if Vy- [y] ^ Vp [y] A V [*] [*] 

then (3x° G A: Vp[y] t vf[y]A 
[x] ^ VAW) false 


S2 


3xi,X2 £ X3y £ Y,xi ^ y ^ X 2 


if Vfi [x] > Vp [x] A Vp [y] < Vfi [y] 
then (3j/° G T : V+[y] ^ vf [y]A 
[x] -fi. Vp[x]) else false 



relation on E is the transitive closure of the local ordering relation on each Ei 
and the ordering imposed by message send events and message receive events 
[13]. A cut C is a subset of E such that if Cj G C then (Ve')e' ^ => e' G C. 

A consistent cut is a downward-closed subset of E in {E, and denotes an ex- 
ecution prefix. For event e, there are two special consistent cuts e and e f- i e 
is the maximal set of events that happen before e. e f is the set of all events up 
to and including the earliest events at each process for which e happens before 
the events. 

Definition 1. Cut i e is defined to be {e' |e' ^ e} and cut ef is defined to be 
{e' \e"^ e} [j {ei,i =1,...,|A^| | e* ^ eA(Ve' ^e*,e' ^ e)}. 

The system state after the events in a cut is a global state; if the cut is 
consistent, the corresponding system state is a consistent global state. We assume 
that the popular vector clocks are available [5,15] - the vector clock V has the 
property that e ^ y(e) < V{f). 

A conjunctive predicate is of the form f\^ fii, where 4>i is a predicate defined 
on variables local to process Pi. The intervals of interest at each process are the 
durations in which the local predicate is true. Such an interval at process Pi is 
identified by the (totally ordered) subset of adjacent events of Ei for which the 
local predicate is true. 



2.2 Pairwise Interactions 

There are 29 or 40 possible mutually orthogonal ways in which any two durations 
can be related to each other, depending on whether the dense or the nondense 
time model is assumed [9]. Informally speaking, with dense time, Vx,y in in- 
terval A, X ^ y 3z £ A \ X z y. These orthogonal interaction types 
were identified by first using the six relations given in the first two columns of 
Table 2. Relations R1 (strong precedence), R2 (partially strong precedence), R3 
(partially weak precedence), R4 (weak precedence) define causality conditions 
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Table 3. The 40 independent relations in 5R [9]. X and Y are intervals. The upper 
part of the table gives the 29 relations assuming dense time. The lower part of the 
table gives 11 additional relations if nondense time is assumed. 



Interaction 

Type 


1 Relation r{X, Y) 


1 Relation r(y, X) I 


R1 


R2 


R3 


R4 


SI 


S2 


R1 


R2 


R3 


R4 


SI 


S2 


IA{= IQ-^) 


1 


1 


1 


1 


0 


0 


0 


0 


0 


0 


0 


0 


IB{= IR-^) 


0 


1 


1 


1 


0 


0 


0 


0 


0 


0 


0 


0 


7C(= IV~^) 


0 


0 


1 


1 


1 


0 


0 


0 


0 


0 


0 


0 


ID{= IX~^) 


0 


0 


1 


1 


1 


1 


0 


1 


0 


1 


0 


0 


ID'{= IU~^) 


0 


0 


1 


1 


0 


1 


0 


1 


0 


1 


0 


1 


IE{= IW~^) 


0 


0 


1 


1 


1 


1 


0 


0 


0 


1 


0 


0 


IE'{= IT~^) 


0 


0 


1 


1 


0 


1 


0 


0 


0 


1 


0 


1 


IF{= IS~^) 


0 


1 


1 


1 


0 


1 


0 


0 


0 


1 


0 


1 


IG{= IG-^) 


0 


0 


0 


0 


1 


0 


0 


0 


0 


0 


1 


0 


IH{= IK~^) 


0 


0 


0 


1 


1 


0 


0 


0 


0 


0 


1 


0 


II{= IJ-^) 


0 


1 


0 


1 


0 


0 


0 


0 


0 


0 


1 


0 


7L(= IQ-^) 


0 


0 


0 


1 


1 


1 


0 


1 


0 


1 


0 


0 


7L'(= IP-^) 


0 


0 


0 


1 


0 


1 


0 


1 


0 


1 


0 


1 


7M(= IM~^) 


0 


0 


0 


1 


1 


0 


0 


0 


0 


1 


1 


0 


IN{= 


0 


0 


0 


1 


1 


1 


0 


0 


0 


1 


0 


0 


IN'(= IN'-^) 


0 


0 


0 


1 


0 


1 


0 


0 


0 


1 


0 


1 


ID"{= {lUX)-'-) 


0 


0 


1 


1 


0 


1 


0 


1 


0 


1 


0 


0 


IE"{= (ITW)-^) 


0 


0 


1 


1 


0 


1 


0 


0 


0 


1 


0 


0 


7L"(= (lOP)-^) 


0 


0 


0 


1 


0 


1 


0 


1 


0 


1 


0 


0 


7A7"(= (IMN)-^) 


0 


0 


0 


1 


0 


0 


0 


0 


0 


1 


1 


0 


IN”(= {IMN')-^) 


0 


0 


0 


1 


0 


1 


0 


0 


0 


1 


0 


0 


IMN"{= {IMN”)-^) 


0 


0 


0 


1 


0 


0 


0 


0 


0 


1 


0 


0 



whereas SI and S2 define coupling conditions. Assuming that time is dense, it 
was shown in [9] that there are 29 possible interaction types between a pair of 
intervals, as given in the upper part of Table 3. Of the 29 interactions, there are 
13 pairs of inverses, while three are inverses of themselves. The twenty-nine in- 
teraction types are specified using boolean vectors. The six relations R1-R4 and 
S1-S2 form a boolean vector of length 12, (six bits for r{X,Y) and six bits for 
r{Y,X)). The interaction types are illustrated in [9]. The nondense time model, 
whose importance is given in [9], permits 11 interaction types between a pair of 
intervals, defined in the lower part of Table 3, besides the 29 identified before. 
Of these, there are five pairs of inverses, while one is its own inverse. These 
interaction types are illustrated in [9] . The set of 40 relations is denoted as 5ft. 



2.3 Modalities for Global Predicates 

Observe that for any predicate 4 >, three orthogonal relational possibilities hold 
under the Possibly/ He/imteZt/ classification: (i) Definitely^/)), (ii) ~<Definitely- 
(4>) A Possibly{4>), (iii) ~<Possibly{(p). 
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Table 4. Refinement mapping [10]. The upper part shows the 29 mappings when 
the dense time model is assumed. With the nondense time model, the 11 additional 
mappings in the lower part also apply. 



Definitely{(j>) 


Possibly{4>) A -^Definitely{<j)) 


-nPossibly(f) 


ID and IX 


IB and IR 


lA and IQ 


ID' and lU 


IC and IV 




IE and IW 


IG 




IE' and IT 


IH and IK 




IF and IS 
10 and IL 
IP and IL' 
IM 

IM' and IN 
IN' 


II and IJ 




ID" and lUX 


IM" and IMN 




IE" and ITW 
IL" and lOP 
IN" and IMN' 


IMN" 





Conjunctive predicates form an important class of predicates and have been 
studied extensively [2, 6, 7, 8]. Based on the definitions of the orthogonal tempo- 
ral interactions [9], the 3 orthogonal relational possibilities based on the Pos- 
sibly / Definitely definitions were refined into the exhaustive set of 40 possibili- 
ties [10]. Table 4 shows this refinement mapping, assuming that the conjunc- 
tive predicate is defined on two processes. When conjunctive predicate </> is de- 
fined over variables that are local to n > 2 processes, one can still express the 
three possibilities (i) Definitely{4>), (ii) -'D e finitely {f) A Possibly{4>), and 
(iii) -<Possibly{<fi), in terms of the fine-grained 40 independent relations between 
C 2 pairs of intervals. Note that not all 40 ^= combinations will be valid - the 
combinations have to satisfy the axiom system given in [9]. 

For n > 2 processes, the refinement mappings of the Possibly and Definitely 
modalities are given by Theorem 1 [10]. 

Theorem 1. [10] Consider a conjunctive predicate 4> = Aifi. The following 
results are implicitly qualified over a set of intervals, containing one interval per 
process. 

— Definitely^if) holds if and only «/ A(vieJV)(VjeiV) [Definitely{(j)i A (jj)] 

— -^Definitely{4>) A Possibly{4>) holds if and only if 

• {3i G N){3j G N)^Definitely{([iA(j)j) A fA(view)(View) [ Possibly {(fi A 

^1)]) 

— -'Possibly{4>) holds if and only if (3i G N)(3j G N)-'Possibly{<[>i A 4>j) 

Consider the following example (from [10]) of the extra information provided 
by the fine-grained modalities. Let ^ be = 2 A = 3 A = 5. Let a^, bj, 
and Cfc be 2, 3, 5 respectively, in intervals Xi, Yj, and Zk, respectively, and 
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I max(X) 



, min(Y^ . m^(Yf . 
min(X)f ; , / noax(X| 



N \ 

\ \ 



' min(Z) j ) \ Z 

‘ 1 N ^ \ . 

' ' s \ \ 

' ' ' jv'(Y;Z);ic:(z,Y)' 



/ max(Z)^ / 



/ / 
' I 



''a. ✓ ^ 

mi'nfYl' Y y max(Y) 

\ Y / 



iiV^z,x), iM (k,z) 



ID(X,Y), IX(Y,X)/' time 

X 



min(X) 1 \ 



max{X) 



max(Z ) ^ '■'.I min(Z) 

Fig. 1. Example [10] to show fine-grained relations across n > 2 processes. 



let ID{Xi,Yj), IV{Yj, Zk), and IN{Zk,Xi) be true. This is shown in Figure 1. 
Then by Theorem 1, we have (i) Definitely{ai = 2Abj = 3), (ii) Possiblyibj = 
3 A Cfe = 5) and ~<Definitely(bj = 3 A = 5), and (iii) Definitely{ai = 
2Ac/j = 5). By Theorem 1, we have the modality Possibly{4>) A~'Definitely{(j)). 
Conversely, if Possibly{(j)) A -^Definitely{(j)) is known in the classical course- 
grained classification, the fine-grained classification gives the added information: 
ID{X„Yj), IV{Y„Zk), and IN{Zk,X,). 



2.4 Objective 

Our objective is to solve Fine-Poss and Finc-Def, i.e., to detect Possibly{4>) 
and Definitely{(j)), for conjunctive predicates, with the added information of 
the exact interaction type between each pair of intervals (one at each process) 
when Possibly((j}) and Definitely{(j)) are true. The extra information about 
the pairwise interaction type is useful, as shown in [10] by considering various 
applications. Another use of the extra information is in multi-player distributed 
games. The overheads of our algorithms are the same as those of the earlier 
algorithms, [GW94] [6] for Possibly{cj)) and [GW96] [7] for Definitely{(j)), that 
can detect only whether Possibly{(j)) is true and whether Definitely{4>) is true. 
Tables 1 compares all the performance metrics. 

3 Detecting Predicates: Framework and Data Structures 

Given a conjunctive predicate, for each pair of intervals belonging to different 
processes, each of the 29 (40) possible independent relations in the dense (non- 
dense) model of time can be tested for using the bit-patterns for the dependent 
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1. When an internal event or send event occurs at process Pi, Vi[i] = Vi[i] + 1. 

2. Every message contains the vector clock and Interval Clock of its send event. 

3. When process Pi receives a message msg, then V j do, 

if {j == i) then Vi[i\ = Vi[i]-\-l, 
else Vi[j\ = max{Vi[j],msg.V[j]). 

4. When an interval starts at Pi (local predicate 4>i becomes true), Ii[i] = Vi\i\. 

5. When process Pi receives a message msg, then V j do, 

Ii[j] = m&K{Ii\j],msg.I[j\). 

Fig. 2. The vector clock and Interval Clock. 

relations, as given in Table 3. The tests for the relations Rl, R2, R3, i?4, S'!, and 
S2 are given in the third column of Table 2 using vector timestamps. Recall that 
the interval at a process is used to identify the period when some local property 
(using which the predicate (j) is defined) holds. V~ and denote the vector 
timestamps at process Pi at the start of an interval and the end of an interval, 
respectively. denotes the vector timestamp of event Xi at Pi. 

The tests in Table 2 can be run by a central server along the lines of the 
algorithms in [4,6,7,8,14]. Processes P\, P 2 , ...., Pn send the timestamps of their 
intervals and certain other local information to the server Pq, which maintains 
queues Qi, Q2, .... ,Qn for each of the processes. We require that the central 
server Pq receive the updates from each Pi, 1 < i < n, in FIFO order. For each 
of the problems to be solved, the server runs different algorithms to process the 
interval information in the queues. We assume that interval X occurs at Pi and 
interval Y occurs at Pj. For any two intervals X and X' that occur at the same 
process, if R\{X,X'), then we say that X is a predecessor of X' and X' is a 
successor of X. Also, there are a maximum of p intervals at any process. 

The operations and data structures required by the algorithms to solve Prob- 
lems Finc-Poss and Finc-Def can be divided into two parts. The first, common 
to all the algorithms, runs on each of the n processes P\ to Pn, and is given in 
this section. The second part of each algorithm runs on the central process Pq 
and is presented in later sections. 



3.1 Log Operations 

Each process Pi, where 1 < i < n, maintains the following data structures. (1) 
Vi : array[l..n] of integer. This is the Vector Clock. (2) A : array[l..n] of integer. 
This is a Interval Clock which tracks the latest intervals at processes. Ii\j] is the 
timestamp Vj[j] when (pj last became true. (3) Logp. Contains the information 
to be sent to the central process. Figure 2 shows how to maintain the vector 
clock and Interval Clock. 

To maintain Logi, the required data structures and operations are defined in 
Figure 3. Logi is constructed and sent to the central process using the protocol 
shown. The central process uses the Log to determine the relationship between 
two intervals. 
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type Event_Interval = record 
intervaLid : integer; 
locaLevent: integer; 

end 



type Log = record 

start: array[l..n] of integer; 
end: array[l..n] of integer; 
pJog: array[l..n] of Process-Log\ 

end 



type ProcesS-Log = record 

event-intervaLgueue: queue of Event-Interval; 

end 



Start of an interval: 

Logi. start = V~. / /Store the timestamp V~ of the starting of the interval. 

On receiving a message during an interval: 

if (change in L) then //Store local component of vector clock and intervaLid 
for each k such that Ii[k] was changed //which caused the change in L 
insert {Ii\k],Vi\i]) in Logi.pJog[k].eventAntervaLqueue. 

End of interval: 

Logi. end = V/*" //Store the timestamp PA of the end of the interval, 

if (a receive or send event occurs between start of previous interval and end of 
current interval) then 

Send Logi to central process. 

Fig. 3. Data structures and operations to construct Log at Pi (1 < i < n). 



3.2 Complexity Analysis at P* (1 < i < n) 

Space complexity of Log. Each Log stores the start {V~) and the end (P“'') 
of an interval, which requires a maximum of 2np integers per process. Consider 
the construction algorithm for Log. Besides the start and the end of each in- 
terval, an Event-Interval is added to the Log for every component of Interval 
Clock which is modified due to the receive of a message. As a change in a 
component of Interval Clock implies the start of a new interval on another pro- 
cess, the total number of times the component of Interval Clock can change is 
equal to the number of intervals on all the processes. Thus the total number of 
Event -Interval which can be added to the Log of a single process is (n — l)p. 
This takes 2(n — l)p integers per process. The total space needed for Logs cor- 
responding to all p intervals on a single process is 2(n— l)p -I- 2np. This gives an 
average of 4n — 2 integers per Log. As only one Log exists at a time, the average 
space requirement at a process P; (1 < f < n) at any time is the sum of space 
required by vector clock. Interval Clock, and Log, which is 6n — 2 integers. 



Message complexity of control messages sent to the central process Pq 

by processes P\ to P„. This can be determined in two ways. As one message is 
sent per interval, the number of messages is 0{p) for each Pi (t yf 0). This gives 
a total message complexity of 0{np). On the average, the size of each message is 
4n — 2 as each message contains the Log. The total message space overhead for a 
particular process is the sum of all the Logs for that process, which was shown to 
be 4np— 2p. Hence the total message space complexity is 4n^p— 2np = 0{n^p). 
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An optimization of message size. The Log corresponding to an interval is 
sent to the central process only if the relationship between the interval and all 
other intervals (at other processes) is different from the relationship which its 
predecessor interval had with all the other intervals (at other processes). Two 
successive intervals Y and Y' on process Pj will have the same relationship if no 
message is sent or received by Pj between the start of Y and the end of Y'. For 
each message exchanged between processes, a maximum of four interval Logs 
need to be sent to the central process, because two successive intervals {Y and 
Y') will have different relationships if a receive or a send occurs between the 
start of Y and end of Y'. This makes it necessary to send two interval Logs 
for a send event and two for a receive event. Hence if there are Wg number of 
messages exchanged between all processes, then a total of 4mg intervals need to 
be sent to the central process in 4mg control messages, while the total message 
space overhead is 2ms-n + 4mg.2n = lOrUgn. The term 2mg.n arises because for 
every message sent, each other process eventually (due to transitive propagation 
of Interval Clock) may need to insert a Event -Interval tuple in its Log. This 
can generate 2nnis overhead in Logs across the n processes. The term 4mg.2n 
arises because the vector clock at the start and end of each interval is sent with 
each message. 

Hence, the total number of control messages sent to the central process and 
the total message space overhead is the lesser of when either four intervals are 
sent for each message exchanged or when all the intervals are sent. Thus the 
total number of messages sent is 0{min{4ms,np)) and the total message space 
overhead is 0{min{Yn^p — 2np, lOrngu)). 

4 Algorithm Fine_Rel: Detecting Fine-Grained Relations 

To solve Problems Finc-Poss and Finc-Def, we first state and solve an interme- 
diate problem. 

Problem Fine-Rel Statement: Given a relation rij from 5ft for each pair of 
processes Pi and Pj, determine on-line the intervals (if they exist), one from 
each process, such that each relation rtj is satisfied by the (Pi,Pj) pair. 

Note that the given relations {rij,yi,j} need to satisfy the axioms on 3ft 
[9] for a solution to potentially exist. A distributed and more complex method 
to solve Fine-Rel, using the data structures of Figure 3 and Theorem 2 below, 
without proofs or a complexity analysis, is presented in [3]. 

Algorithm Overview: The algorithm detects a set of intervals, one on each 
process, such that each pair of intervals satisfies the relationship specified for 
that pair of processes. If no such set of intervals exists, the algorithm does not 
return any interval set. The central process Pq maintains n queues, one for Logs 
from each process and determines which orthogonal relation holds between pairs 
of intervals. The queues are processed using “pruning”, described later. If there 
exists an interval at the head of each queue and these intervals cannot be pruned, 
then these intervals satisfy rij V i,j, where i ^ j and 1 < i,j < n, and hence 
these intervals form a solution set. 
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For S2{X,Y) 

1. // Eliminate from Log of interval Y (on Pj), all receives of messages 
/ /which were sent by Pi before the start of interval X (on Pi). 

(la) for each event Anterval G Logj .pJog[i\.eventAntervaPqueue 

(lb) if {event Jnterval.intervalJd < Logi.start[i]) then 

(l c) delete eventAnterval. 

2. // Select from the pruned Log, the earliest message sent from interval X to Y . 
(2a) temp = oo 

(2b) if {Logj .start[i] > Logi.start[i]) then temp = Logj ,start[j] 

(2c) else 

(2d) for each eventjnterval G Logj .pJog[i].eventJnterval_queue 
(2e) temp — min{temp, event Jnterval.locaPevent). 

3. if {Logi.end[j] > temp) then S2{X,Y) is true. 

For 5'l(y,X) 

1. Same as step 1 of the algorithm to determine S2{X, Y). 

2. Same as step 2 of the algorithm to determine S2{X, Y). 

3. if {Logi .end[j] < temp) and {temp > Logj .start[j]) then S'1(E, X) is true. 

Fig. 4. Tests for coupling relations S'1(X, F) and S2{Y,X) at Pq. 



We first define the function S{rij) and the relation h. Recall that X and Y 
are intervals on Pj and Pj, respectively, and Y' is any interval that succeeds Y. 

Definition 2. Function S \ ^ ^ 2^ is defined to be S{rij) = {i? G 5? | i? yf 
VijA if R{X,Y) is true then rij{X,Y') is false for all Y' that succeed Y }. 

Intuitively, for each G 5ft, we define a prohibition function S{vij) as the 
set of all relations R such that if R{X, Y) is true, then rjj{X, Y') can never be 
true for some successor Y' of Y. S{ri j) is the set of relations that prohibit j 
from being true in the future. 

Two relations R' and R” in 5ft are related by the allows relation h if the 
occurrence of R'{X,Y) does not prohibit R”{X,Y') for some successor Y' of Y . 

Definition 3. h zs a relation on 3ft x 3ft such that R' hi?" if (1) R' R" , and 

(2) if R'{X, Y) is true then R"{X, Y') can be true for some Y' that succeeds Y . 

For example, IC h IB because (1) IC IB and, (2) if IC{X,Y) is true, 
then there is a possibility that IB{X, Y') is also true, where Y' succeeds Y. 

Lemma 1. If R £ S{rij) then R \f rtj else if R ^ ^ ^ ’"hj - 

Proof. If i? G S{vij), using Definition 2, it can be inferred that is false 
for all Y' that succeed Y. This does not satisfy the second part of Definition 3. 




Global Predicate Detection under Fine-Grained Modalities 



101 



Table 5. S{rij) for the 40 independent relations in IR. The upper part of the table 
gives the function S on 29 relations assuming dense time. The lower part of the table 
gives function S for the 11 additional relations assuming non-dense time. 



Interaction 
Type j 


S(rij) 




lA (= j-Q-q 


<P 


8ft - {7Q} 


IB (= J-R-l) 


{I A, IF, II, IP, lO, lU, IX, lUX, lOP} 


3? - {IQ, IR} 


IC (= iv-^-) 


{lA, IB, IF, II, IP, lO, lU, IX, lUX, lOP} 


3ft - {IQ, IV} 


ID (= 


8ft - {IQ, IS, IR, IJ, IL, IL' , IL" , ID, ID' , ID"} 


3ft - {IQ, IX} 


ID' (= IU~‘-) 


aft - {IQ, IS, IR, IJ, IL, IL' , IL" , ID, ID' , ID"} 


5ft - {IQ, lU} 


IE (= IW~^) 


8ft - {IQ, IS, IR, IJ, IL, IL' , IL" , ID, id' , I D" , IE} 


5ft - {IQ, IW} 


IB' (= /T“l) 


3ft - {IQ, IS, I R, IJ, I L, I L' , I L" , ID, ID' , ID" , IB'} 


3ft - {IQ, IT} 


IF (= IS-'-) 


8ft - {IQ, IS, IR, IJ, IL, IL' , IL" , ID, ID' , ID" , IF} 


3ft - {IQ, IS} 


IG (= JG“^) 


3ft - {IQ, IR, IJ, IV, IK, IG} 


3? - {IQ, IR, IJ, IV, IK, IG} 


IH (= IK~'^) 


5ft - {IQ, IR, IJ, IV, IK, IC, IH} 


5ft - {IQ, IR, IJ, IK} 


II (= /j-q 


5ft - {IQ, IR, IJ, IV, IK, IG, 11} 


3ft - {IQ, IR, IJ} 


IL (= /0“^) 


3? - {IQ, IR, IJ, IL} 


3? - {IQ, IR, IJ, lO} 


IL' (= IP-^) 


3ft - {IQ, IR, IJ, IL'} 


3ft - {IQ, IR, IJ, IP} 


IM (= IM~^) 


3ft - {IQ, IR, IJ, IM} 


5ft - {IQ, IR, IJ, IM} 


IN (= IM'~^) 


5ft - {IQ, IR, IJ, IN} 


3ft - {IQ, IR, IJ, IM'} 


in' (= in'-'^) 


3? - {IQ, IR, IJ, IN'} 


3ft - {IQ, IR, IJ, IN'} 


id" (= (lUX)-'-) 


5ft - {IQ, IS, IR, IJ, IL, IL' , IL" , ID, I D' , ID"} 


3ft - {IQ, lUX} 


IB" (= (7TW)“1) 


8ft - {IQ, IS, IR, IJ, IL, IL' , IL" , ID, I D' , ID" , IE"} 


3ft - {IQ, ITW} 


JunxtJopmF 


3? - {IQ, IR, IJ, IL"} 


3ft - {IQ, IR, IJ, lOP} 


IM" (= {IMN)~^) 


3ft - {IQ, IR, IJ, IM"} 


3ft - {IQ, IR, IJ, IMN} 


IN" (= {IMN')-'^) 


3ft - {IQ, IR, IJ, IN"} 


3? - {IQ, IR, IJ, IMN'} 


IMN" (= (imn"}-^} 


3? - {IQ, IR, IJ, IMN"} 


3ft - {IQ, IR, IJ, IMN"} 



Hence R \f rjj. If i? ^ 'S'(Gj) and R ^ it follows that rjj can be true for 
some Y' that succeeds Y . This satisfies Definition 3 and hence R h rjj. □ 
Table 5 gives S{rij) for each of the 40 interaction types in K. The table is 
constructed by analyzing each interaction pair in 5ft. We now state an important 
result between any two relations in 5ft that satisfy the “allows” relation, and the 
existence of the “allows” relation between their respective inverses. Specifically, 
if R' allows R” , then Theorem 2 states that R'~^ necessarily does not allow 
relation R''~^. The theorem can be observed to be true from Lemma 1 and 
Table 5 by a case-by-case analysis. 

Theorem 2. For R',R" £ 5ft, if R' h R” then R'~^ \f R"~^ . 

Taking the same example, IC h IB => IV {= IC~^) 1/ IR{= IB~^), which 
is indeed true. Note that R' yf R” in the definition of relation h is necessary; 
otherwise R' h R' will become true and from Theorem 2, we get R'~^ 1/ R'~^ 
which leads to a contradiction. 

Lemma 2. If the relationship R{X,Y) between intervals X and Y (belonging 
to process Pi and Pj, resp.) is contained in the set S(rij), then interval X can 
be removed from the queue Qi. 

Proof. From the definition of S{rij), we get that rij{X, Y') cannot exist, where 
Y' is any successor interval of Y . Hence interval X can never be a part of the 
solution and can be deleted from the queue. □ 

Lemma 3. If the relationship between a pair of intervals X and Y (belonging to 
processes Pi and Pj resp.) is not equal to Vij, then either interval X or interval 
Y is removed from the queue. 
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Proof. We use contradiction. Assume relation R{X,Y) rij{X,Y)) is true 
for intervals X and Y . From Lemma 2, the only time neither X nor Y will be 
deleted is when R ^ S{rij) and R~^ ^ S{rj^i). From Lemma 1, it can be inferred 
that R F Vij and R~^ F rj^i. As r~J = rj^i, we get R F Vij and R~^ F r~J. This 
is a contradiction as by Theorem 2, R\- nj R~^ \f r~J. Hence R G S{rij) or 
R~^ G S{rj^i) or both, and thus at least one of the intervals can be deleted. □ 

Theorem 3. Algorithm Finc-Rel run by Pq in Figure 5 solves Problem Fine-Rel. 

Proof. Lemma 2 which allows queues to be pruned correctly is implemented 
in the algorithm at Pq- The algorithm deletes interval X as soon as R{X, Y) G 
S{ri,j) (lines 13-17). Similarly, Y is deleted if R(Y,X) G S{rj^i) (lines 15-17). 
Thus, an interval gets deleted only if it cannot be part of the solution. Also 
clearly, each interval gets processed unless a solution is found using a predeces- 
sor interval from the same process. Lemma 3 gives the unique property that if 
R{X,Y) yf rj j, then either interval X or interval Y is deleted. A consequence 
of this property is that if every queue is non-empty and their heads cannot be 
pruned, then a solution exists and the set of intervals at the head of each queue 
forms a solution. 

The set updatedQueues stores the indices of all the queues whose heads got 
updated. In each iteration of the while loop, the indices of all the queues whose 
head satisfy Lemma 2 are stored in set newU pdatedQueues (lines (13)-(16)). In 
lines (17) and (18), the heads of all these queues are deleted and indices of the 
updated queues are stored in the set updatedQueues. Observe that only interval 
pairs which were not compared earlier are compared in subsequent iterations 
of the while loop. The loop runs until no more queues can be updated. If at 
this stage all the queues are non-empty, then a solution is found (follows from 
Lemma 3). If a solution is found, then for the intervals X (at Pi) and Y (at Pj) 
stored at the heads of these queues, we have R{X,Y) = nj. □ 

For processes Pi to P„, the space complexity was shown in Section 3.2 to be 
on average 0{n) at each process. Using the optimization in Section 3.2, the total 
number of messages sent is equal to 0{min{Ams,pn)) and the total message 
space complexity is 0{min{{Yn — 2)np, lOnWs)). 

Theorem 4. Algorithm Fine-Rel has the following complexities. 

— The total message space complexity is 0{min{{Yn — 2)np, lOnrUs)). 

— The total space complexity at process Pq is 0{min{{Yn — 2)np, lOnWg)). 

— The average time complexity at Pq is 0{{n — l)min{4ms , pn)) . This is equiv- 
alent to 0{n^M), where M is maximum number of entries in a queue. 

Proof. For the central process Pq, the total space required is 0{min{{An — 2)np,- 
lOnms)) because the total space overhead at Pq is equal to the total message 
space complexity, which was computed in Section 3.2. 

The time complexity is the product of the number of steps required to de- 
termine a relationship and the number of relations determined. 

Consider the first part of the product. 




Global Predicate Detection under Fine-Grained Modalities 



103 



queue of Log: Qi, Q 2 , • • • Qn =-L 

set of int: updatedQueues , newUpdatedQueues = {} 

On receiving interval from process Pz at Po 

(1) Enqueue the interval onto queue Qz 

(2) if (number of intervals on Qz is 1) then 

(3) updatedQueues = {z} 

(4) while (updatedQueues is not empty) 

(5) newU pdatedQueues={} 

(6) for each i £ updatedQueues 

(7) if (Qi is non-empty) then 

(8) X = head of Qi 

(9) for j = 1 to n 

(10) if (Qj is non-empty) then 

(11) Y = head of Qj 

(12) Test for R(X, Y) using the tests in Fig. 4 and Tab. 2 

(13) if (R(X, Y) £ S(nj)) then 

(14) newUpdatedQueues = {i} U newUpdatedQueues 

(15) if (R(Y, X) £ S(rj,,)) then 

(16) newUpdatedQueues = {_)} U newUpdatedQueues 

(17) Delete heads of all Qk where k £ newUpdatedQueues 

(18) updatedQueues = newUpdatedQueues 

(19) if (all queues are non-empty) then 

(20) solution found. Heads of queues identify intervals that form the solution. 



Fig. 5. On-line algorithm Fine-Rel at Po- 



— The total number of interval pairs between any two processes Pj and Pj is 

To determine R1(X, Y) to R4(X, Y) and Rl(Y, X) to P4(T, X), as eight 
comparisons are needed for each interval pair, a total of 8p^ comparisons are 
necessary for any pair of processes. 

— To determine the number of comparisons required by 51 and 52, consider the 
maximum number of Event-Intervals stored in Logj.pJog[i] that are sent 
over the execution lifetime to the central process as part of the Logs. This 
is the maximum number of Event -Intervals corresponding to Pt stored in 
Qj over P^-’s execution lifetime. An Event-Interval is added to Logj.p-log[i] 
only when there is a change in the component of Interval Clock at the 
receive of a message. As the component of Interval Clock changes only 
when a new interval starts, the total number of times the component 
of Interval Clock changes is at most equal to p, the maximum number of 
intervals occurring on the other process Pi. From Figure 4, it can be observed 
that for each Event-Interval, there is one comparison. Thus, to determine 
the relationship between an interval on Pj and all other intervals on Pj, the 
number of comparisons is equal to p. As there are p intervals on P^, a total of 
p^ comparisons are required to determine 51 or 52. Hence the total number 
of comparisons to determine 51(A, T), S2(X,Y), 51(F, A), and S2(Y,X) 
is 4p^. 
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This gives a total of + 4p^ = 0{p^) comparisons to determine the relation 

between each pair of intervals on a pair of processes. As there are a total of 
intervals pairs between two processes, the average number of comparisons 
required to determine a relationship is 0(1). 

To analyse the second part of the product, consider Figure 5. For each in- 
terval considered from one of the queues in updatedQueues (lines (6)-(12)), the 
number of relations determined is n — 1. Thus the number of relations deter- 
mined for each iteration of the while loop is (n — l)\updatedQueues\, where 
\updatedQueues\ denotes the number of entries in updatedQueues. The cumu- 
lative ^ \updatedQueues\ over all iterations of the while loop is less than the 
total number of intervals over all the queues. Thus, the total number of relations 
determined is less than (n — l)min{4ms,pn), where min{4ms,np) is the upper 
bound on the total number of intervals over all the queues. As the average time 
required to determine a relationship is 0(1), the average time complexity of the 
algorithm is equal to 0((n — l)min{4ms,pn)). 

The average time complexity can be equivalently expressed using M, the 
maximum number of entries in a queue, as follows. The total number of intervals 
over all the queues is O(nM). As the total number of relations determined is (n— 
1) X) \ updatedQueues\ over all the iterations of the while loop, this is equivalent 
to (n — l).nM = 0{n^M). This is also the average time complexity because it 
takes 0(1) time on the average to determine a relationship. □ 

Table 1 compares the complexities of Fine.Rel with those of GW94 [6] and 
GW96 [7]. GW94 and GW96 computed their time complexity at Pq as only 
0{'n?M), not in terms of iris or p. They did not give the space complexity at Pq- 
As each control message in GW94 and GW96 carries a fixed size 0(n) message 
overhead and a control message is sent to Pq for every message send/receive 
event, we have computed their total space complexity and average time com- 
plexity at Pq as Oiniris). This enables a direct comparison with the complexities 
of our algorithm. Further, we have also computed our average time complexity 
using M, as 0{n^M). In our algorithm, note that M < p; M = p if the message 
overhead optimization is not used. We do not express the total space at Pq in 
terms of M because the queue entries are of variable size, with an average size 
of (4n — 2) integers. 

5 Algorithms Fine_Poss and Fine_Def 

By leveraging Theorem 1 and the mapping of fine-grained modalities to Possibly 
and Definitely modalities, as given in Table 4, we address the problems of de- 
termining whether Possibly{(j)) and Definitely{cj)) hold. If either of these two 
coarse-grained modalities holds, we can also determine the exact fine-grained 
orthogonal relation/modality between each pair of processes, unlike any previ- 
ous algorithm. Further, the time, space, and message complexities of the pro- 
posed on-line (centralized) detection algorithms (Algorithms Finc-Poss and 
Fine-Def) to detect Possibly and Definitely in terms of the fine-grained 
modalities per pair of processes, are the same as those of the earlier on-line 
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(centralized) algorithms [6,7] that can detect only whether the Possibly and 
Definitely modalities hold. 

Recall that 5ft is a set of orthogonal relations and hence one and only one 
relation from 5ft must hold between any pair of intervals. Consider the case where, 
for each pair of processes (Pi,Pj), we are given a set rfj C 3ft such that we are 
satisfied if some relation in r* j holds. Now consider the objective where we need 
to identify one interval per process such that for each process pair (Pi, Pj), some 
relation in r* j holds for that {Pi, Pj). Such an objective would be useful if we can 
leverage the coarse-to-fine mapping of modalities, given in Table 4. We formalize 
such an objective by generalizing the detection problem Fine-Rel to problem 
Fine.Rel', as follows. 

Problem Fine^ReV Statement: Given a set of relations r*j C 3ft for each 
pair of processes Pi and Pj, determine on-line the intervals, if they exist, one 
from each process, such that any one of the relations in r* j is satisfied (by the 
intervals) for each (Pi, Pj) pair. If a solution exists, identify the fine-grained 
interaction from 5ft for each pair of processes in the first solution. 

To solve Fine-ReV , given an arbitrary r* j, a solution based on algorithm 
Fine-Rel (Figure 5) will not work because in the crucial tests in lines (13)-(14), 
neither interval may be removable, and yet none of the relations from r* j might 
hold between the two intervals. This leads to deadlock! To see this further, 
let rl,r2 G and let R{X,Y) hold, where R ^ r*j. Now let R G 5'(rl), 
R~^ ^ S'(rl“^), R ^ S{r2), R~^ G S'(r2“^). Interval X cannot be deleted 
because r2{X, Y') may be true for a successor Y' . Interval Y cannot be deleted 
because rl~^{Y,X') may be true for a successor X' . Therefore, a solution based 
on Algorithm Fine^Rel will deadlock, and a more elaborate (and presumably 
expensive) solution will be needed. 

We now identify and define a special property, termed CONVSXXTy, on r* j 
such that the deadlock is prevented. Informally, this property says that there is 
no relation R outside r* j such that for any rl, r2 G r* j, R\- rl and R~^ h r2~^. 
This property guarantees that when intervals X and Y are compared for r* ^ and 
R{X, Y) holds, either A5 or F or both get deleted, and hence there is progress. 
The sets r* j, derived from Table 4, that need to be detected to solve Problems 
Finc-Poss and Fine-Def satisfy this property. We therefore observe that prob- 
lems Finc-Poss and Fine-Def are special cases of Problem Finc-Rel' in which 
the property COMVEXXTy on r*j is necessarily satisfied. To solve Problems 
Fine-Poss and Fine-Def, we then use the generalizations of Lemmas 2 and 3, 
as given in Lemmas 4 and 5, respectively, to first solve Fine-ReV . 

Definition 4. 



COAfVSXXry: yR^rX : (Vr,,, G r(j,R G S{n,j) \/ Vr,- , 



G r 



J.* 



R-^ G 



-5'(rJ■^)) 



Lemma 4. If the relationship R{X, Y) between intervals X and Y (belonging to 
processes Pi and Pj, respectively) is contained in the set p|^. S{rij), then 

i,j ’ 

interval X can be removed from the queue Qi. 
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( 13 ) 

( 14 ) 

( 15 ) 

( 16 ) 



if(7?(A,y)ea,,e.* S(n,,))then 

newU pdatedQueues = {i} U newUpdatedQueues 
ii{R{Y,X)en 5(r,.,))then 

j,i 

newUpdatedQueues = {j} U newUpdatedQueues 



Fig. 6. Algorithm Fine-Rel' : Changes to algorithm Fine-Rel are listed, assnming r* j 
satisfies property COAfVSXXFy. 



Proof. From the definition of S{rij), we infer that no relation Tij{X, Y'), where 
Ti^j G r* j and Y' is any successor interval of Y on Pj, can be true. Hence interval 
X can never be a part of the solution and can be deleted from the queue. □ 

Lemma 5. If the relationship R(X,Y) between a pair of intervals X and Y 
(belonging to processes Pi and Pj, respectively) does not belong to the set r* j, 
where rU satisfies property COMVEXITy , then either interval X or interval 
Y is removed from the queue. 

Proof. We use contradiction. Assume relation R{X,Y) (^ r* j{X,Y)) is true 
for intervals X and Y . From Lemma 4, the only time neither X nor Y will be 
deleted is when both R ^ S{vij), and R~^ ^ iGr* However, 

as r* . satisfies property COAfVSXTTy, we have that R G P .p . S{rij) or 
R~^ G Pl^. g,,. S{rjj) must be true. Thus at least one of the intervals can be 

j,i ’ 

deleted by an application of Lemma 4. □ 

The proof of the following theorem is similar to the proof of Theorem 3. 

Theorem 5. If the set r*j satisfies property COAfVSXXTy , then Problem 
Fine-ReV is solved by replacing lines (13) and (15) in algorithm Fine-Rel in 
Figure 5 by the lines (13) and (15) in Figure 6. 

Proof. Analogous to the proof of Theorem 3. Use Lemmas 4 and 5 instead of 
Lemmas 2 and 3, respectively, and reason with r* ^ instead of with ryj. □ 

Corollary 1. The time, space, and message complexities of Algorithm Fine- 
Rel' are the same as those of Algorithm Finc-Rel, which were stated in Theo- 
rem 4- 

Proof. The only changes to Algorithm Finc-Rel are in lines (13) and (15). In 
Algorithm Finc-Rel' , instead of checking R{X,Y) for membership in S{rij) in 
line (13), R{X,Y) is checked for membership in P^,. g^,. S{rij). Both S{rij) 
and P,,. g,,. S{rij) are sets of size between 0 and 40. An analogous observation 

holds for the change on line (15). Hence, the time, space, and message complex- 
ities of Fine -Rel are unaffected in Fine -Rel'. □ 

To detect Possibly{(j)), r*^ is set to the union of the orthogonal interac- 
tions in the first two columns of Table 4. We can verify (by case-by-case enu- 
meration) that r*j does satisfy property COAfVSXXTy . Similarly, to detect 
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Definitely{4>), r*j is set to the union of the orthogonal interactions in the first 
column of Table 4. We can verify (by case-by-case enumeration) that r* j does 
satisfy property COAfVSXITy. 

The following two theorems about using algorithm Finc-Rel' (Figure 6) to 
solve Problems Fine -Pass and Finc-Def can be readily proved by using The- 
orem 1, the refinement mapping of Table 4, and Theorem 5. The two resulting 
algorithms are named Finc-Poss and Fine^Def, respectively. 

Theorem 6. Algorithm Fine^Rel modified to algorithm Fine^Rel' (Figure 6) 
solves Problem Fine^Poss (about Possibly{4>)) when r* j is set to the union of 
the relations in the first and second columns of Table f. 

Proof. From Theorem 1, Possibly{(f>) is true if and only if (Vt G N){\/j G 
N) Possibly {(j>i A 4>j). For any i and j, Possibly{4>i A 4>j) is true if and only 
if R{Xi,Yj) is any of the temporal relations given in the first two columns of 
Table 4. When r* ^ is set to the union of the relations in these two columns, 
we can verify (by case-by-case enumeration) that r* j satisfies COMVEXXTy . 
As Algorithm Fine-Rel' is correct (by Theorem 6), when its r* ^ is instantiated 
with the set above to get Algorithm Fine-Poss, we have that Fine^Poss is also 
correct. □ 

Theorem 7. Algorithm Fine-Rel modified to algorithm Fine-Rel' (Figure 6) 
solves Problem Fine-Def (about De finitely {4>) ) when r*j is set to the union of 
the relations in the first column of Table f. 

Proof. From Theorem 1, Definitely{cj)) is true if and only if (Vz G N)(yj G 
N)Definitely{4>i A 4>j). For any z and j, Definitely{4>i A 4>j) is true if and only 
if R{Xi,Yj) is any of the temporal relations given in the first column of Table 4. 
When r* j is set to the relations in this column, we can verify (by case-by- 
case enumeration) that r*j satisfies CONVSXTTy . As Algorithm Fine.Rel' is 
correct (by Theorem 6), when its r*j is instantiated with the set above to get 
Algorithm Fine-Def, we have that Fine-Def is also correct. □ 

In algorithm Fine-Rel' , when r*j is set to the values as specified in Theo- 
rems 6 and 7 to detect Possibly and Definitely, respectively, set P|^. S{rij) 

used in line (13) of the algorithm becomes {I A} and {IA,IB,IC,IG,IH,II}, 
respectively. An identical change occurs to the set P|^. S{rj^i) on line (15). 

Corollary 2. The time, space, and message complexities of Algorithms Finc-- 
Poss and Fine-Def are the same as those of Algorithm Fine-Rel (stated in 
Theorem f) and of Algorithm Fine-Rel' (stated in Corollary 1). 

Proof. Follows from Corollary 1 and the fact that r* j for Finc-Poss and 
Fine-Def satisfy COAfVSXITy and are instantiations of r*j in Fine-Rel'. □ 

6 Discussion & Conclusions 

This paper presented algorithms to detect conjunctive predicates under fine- 
grained modalities. Algorithms Fine-Poss and FinC-Def not only detect Poss- 
ibly {4>) and Definitely{4>), respectively, but also (unlike previous algorithms) 
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return the pairwise fine-grained relations which exist between all the intervals 
in the solution set. The space, message, and computational complexities of the 
previous works for conjunctive predicate detection, GW94 [6] and GW96 [7], for 
detection of only Possibly {(j)) and Definitely{(j)), respectively, is compared with 
our algorithms in Table 1. All the complexity measures for algorithms Fine -Pass 
and FineJJef are the same as those for GW94 [6] and GW96 [7]. Thus with 
the same overhead. Algorithms Fine^Poss and Fine^Def do the extra work of 
finding the fine-grained relations which exist between the intervals contained in 
the solution set for Possibly and Definitely. 

A detailed version of these results appears in [1] . Distributed algorithms can 
be devised for Fine^Poss and Fine^Def based on the distributed algorithm 
given in [3] to solve Fine-Rel. A discussion of how intervals might be identified 
when trying to use the fine-grained modalities on nonconjunctive predicates, i.e., 
general relational predicates, is given in [11]. 
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Abstract. Action refinement is a practical hierarchical method to ease 
the design of large reactive systems. Relating hierarchical specification 
to hierarchical implementation is an effective method to decrease the 
complexity of the verification of these systems. In onr previous work 
[15], this issue has been investigated in the simple case of the refinement 
of an action by a finite process. 

In this paper, on the one hand, we extend our previous results by consid- 
ering the issne in general, i.e., refining an abstract action by an arbitrary 
process; on the other hand, we exploit different techniques such that our 
method is easier to be followed and applied in practice. 

Keywords: Action refinement, modal logic, specification, verification, 
reactive system. 



1 Introduction 

Generally speaking, it is not easy, even impossible to capture a complex system 
at the beginning. The hierarchical development method is one of the practical and 
effective methods for designing large systems by specifying and implementing a 
system at different levels of abstraction. In process algebraic settings, action re- 
finement [8] is such a kind of methods. We are here interested in the question 
how verification can be incorporated in the hierarchical development. In particu- 
lar, we investigate how action refinement can be incorporated into a specification 
logic in such a way that it mimics the refinement in the process algebra. In the 
literature, some first attempts to solve this problem are given, for example in 
[10,13,14]. 

The main results obtained in [10,13,14] are as follows: Given an abstract 
specification f in some logic, say the modal ^-calculus, and a model P of a 
complex system, and a refinement Q for a primitive a in P, where Q is a finite 
process, build P[a Q] and 4>[a ^ Q] as the refinements of the model P and 
the specification f respectively. [10] and [13,14] deal with P[o'^ Q] in different 
way, but all define (j)[a Q] by replacing (a) and [a] in <f> by some formulae of 
the forms (ai)(o 2 ) . . . (a„) and [ai][a 2 ] . . . [on] respectively, where ai 02 . . . a„ is 



V.A. Saraswat (Ed.): ASIAN 2003, LNCS 2896, pp. 110-124, 2003. 
© Springer- Verlag Berlin Heidelberg 2003 




Combining Hierarchical Specification with Hierarchical Implementation 



111 



a run of Q. Then they prove that P \= 4> iS P[a ^ Q] \= 4>[a Q] under some 
syntactical conditions. 

In the above approaches, the refinements of the specification and the model 
are explicitly built on the structure of Q. This restricts the refinement step in 
two ways: firstly, there are some desired properties of the refined system that 
cannot be deduced in the setting of [10,13,14]. For example, let P = a; & + a; c, 
4> = (a), Q = a'; (c'; b'; d! + c'; b'). It’s obvious that P \= 4> and Q ^ {a')[d]{b'). 
It is expected that P[a Q] ^ (a')[c'](P). But it cannot be derived using the 
approaches of [10,13,14]; secondly, the refinement step is restricted to one choice 
of Q for refining an action a, which appears both in the refined process and the 
refined specification explicitly. 

In contrast to this, in [15] we proposed a general approach on how to con- 
struct a low-level specification by refining the higher-level specification. But as 
in [10,13,14], we also only considered the simple case to refine an abstract action 
by a finite process. The basic idea is to define a refinement mapping 17 which 
maps the high-level specification (j) and the property if) of the refinement Q of 
an abstract action a to a lower-level specification by substituting ip for (a) and 
[a] in (f>. Since ip can be any property that holds in Q, we can get the expected 
specification if (p and ip are appropriate. For example, in the above example, 
we can get n{(p,{a')[c']{b'),a) = (a')[c'](6') which is exactly what we expect. 
However, Q can only be any finite process, which implies that ip is essentially 
equivalent to a formula without fixpoint operators. 

But, in many applications, an action has to be refined by a process with 
potentially infinite behaviour. For example, in the programming, we can look 
the interface of a procedure as an abstract action and its body as its refinement. 
In an abstract level, we only need to use the interface instead of the procedure, 
but it is necessary to substitute the body for the interface when the procedure 
is considered in a lower level. In many cases, we need to implement a procedure 
with possibly infinite behaviour in order to meet the given requirements. For 
instance, in the example of a salesman [15] (It can be found in Section 4), if we 
know that the job of the salesman in London every day is repeatedly to meet 
some of his customers in his office or contact some of them by phone, the action 
“work” in the top-most specification should be refined by the above concrete 
procedure in the lower-level specification. However, it is obvious that such a job 
can not be done using our previous approach. 

So, in order to have more applications, we extend our previous work by 
refining an abstract action by an arbitrary process in this paper. To this end, 
we adopt FLC as specification language. 

FLC is due to Miiller-Olm [16], and is an extension of the /i-calculus by 
introducing the chop operator FLC is strictly more expressive than the ^- 
calculus because the former can define non-regular properties [16], whereas the 
latter can only express regular properties [7,9]. The model-checking of FLC was 
addressed in [11,12]. For technical reasons, here we augment FLC by introducing 
a special propositional constant i/ to indicate if a process is terminated and re- 
interpret [o] appropriately. 
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As discussed in [15], a sound refinement mapping should keep the type of 
properties to be refined, i.e. an existential property should be refined to an 
existential property and similarly for the other properties. In order to define a 
refinement mapping that can preserves the type of the property to be refined, 
as in [15], the property for the refinement Q will be partitioned into two sub- 
formulae: an universal formula and an existential formula. The former will be 
used to substitute for [a] and the latter for (a) in <j). Such partition is justified by 
the result proved in [4] that every property can be represented as the conjunction 
of a safety property and a liveness property in branching models. Besides, we 
require that ip is only relevant to full executions of Q. If so, a refinement mapping 
that keeps the type of the property to be refined can be defined like in [15]. 
Furthermore, we can prove the following theorem: 

Theorem. (Refinement Theorem) If some syntactical conditions hold, P ^ 
(p and Q \= ip;yj then P[a 

The above theorem supports ‘a priori’ verification in the following sense: In 
the development process we start with P \= (p and either refine P and obtain 
automatically a (relevant) formula that is satisfied by P[a Q]; or, we refine (p 
using Q{(p, Ip, a) and obtain automatically a refined process P[a ^ Q] that sat- 
isfies the refined specification. Of course such refinement steps may be iterated. 

To achieve the intended result, we need to assume that action refinement for 
models is atomic. Our main aim in this work is to establish a correspondence 
between hierarchical implementation and hierarchical specification of a complex 
system. But if we allow that the refining process can be interleaved with others 
problems will arise. E.g. (a []{} 6)[a oi; 02] means the parallel executions of a 
and b in which a is refined by ai;o2. It’s obvious that a [[{} b satisfies (a), and 
Oi; 02 satisfies (oi); ((02) A [6]; false) which means that Oi; 02 first performs Oi, 
then 02 but cannot perform b. We expect that o []{} 6 meets (oi); ((o 2)A[6]; false) 
after refining a by ai;o2. This is not true in the case of non-atomic action 
refinement since b can be performed between the execution of oi and 02. But it 
is valid if we assume that action refinement is atomic [5,6]. So, in the sequel, we 
discuss action refinement for models under the assumption of atomicity. 

Besides, we will exploit different techniques such that all the results proved 
in this paper are represented in a simpler way and easier to be used in practice. 

The remainder of this paper is organized as follows: A modeling language is 
defined in Section 2; Section 3 briefly reviews FLC; A refinement mapping for 
specifications is given in Section 4; The correspondence between the hierarchical 
specification and the hierarchical implementation of a complex system is shown 
in Section 5; Finally, a brief conclusion is provided in Section 6. 

2 Modeling Language — A TCSP-like Process Algebra 

2.1 Syntax 

As in [14], we use a TCSP-like process algebra in combination with an action 
refinement operator as modeling language. Let Act be an infinite set of (atomic) 
actions, ranged over by a, b,c,.. ., and A be a subset of Act. Let A be a set of 
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process variables, ranged over by x,y,z,.... The language of processes, denoted 
by V and ranged over P,Q, . . ., is generated by the following grammar: 

Definition 1. 

P ::= S I nil \ a \ x \ P',Q \ P + Q \ P \\a Q \ rec x.P \ P[a ^ Q] 
where a € Act, x € X, and P,Q €V. 

An occurrence of a process variable x € X is called bound in a process term P 
iff it does occur within a sub-term of the form rec x.P' , otherwise called free. A 
process expression P is called closed iff all occurrences of each variable occurring 
in it are bound, otherwise it is called open. We will use fn{P) to stand for the 
variables that have some free occurrence in P, bn{P) for the variables that have 
some bound occurrence in P. When we say a process P is terminated, it means 
that P does nothing except for terminating ( see Definition 2) . A variable x G A 
is called guarded within a term P iff every occurrence of x is within a sub-term 
Q where Q lies in a subexpression Q*; Q such that Q* is not terminated. A term 
P is called guarded iff all variables occurring in it are guarded. Sometimes, we 
abuse Act{P) to stand for the set of actions which occur in P. 

For technical reasons, as in [8], we require the following well-formedness con- 
ditions on V: 

(i) None of operands of -I- is a terminated process; 

(ii) All process terms are guarded; 

(iii) The refinement of an action can not be a terminated process. As discussed, 
e.g. in [17], refining an action by a terminated process is not only counter- 
intuitive but also technically difficult. 

Intuitively, P[a Q] means that the system replaces the execution of an 
action a by the execution of the subsystem Q every time when the subsystem P 
performs a. This operator provides a mechanism to hierarchically design reactive 
systems. The other expressions of V can be conceived as usual ones. The formal 
interpretation of P will be provided in the next section. 

2.2 Operational Semantics 

Here we define an operational semantics for P employing transition systems. The 
meaning of the constructs of the language can be interpreted in the standard 
way except for the refinement operator. In order to guarantee the atomicity of 
the refinement, the basic idea is to define a transition system for the process 
that may be refined, then replace all transitions labelled with the action to be 
refined by the transition system for the refinement. 

Similar to [8], the above idea can be implemented by introducing an auxiliary 
operator * to indicate that a process prefixed with it is the remainder of some 
process, which has the highest precedence and must be performed completely. 
The state language, denoted by P* , ranged over by s, . . ., is given by: 

s ::= nil | <5 | a | x | *s | s; s | P -|- Q | s m s | s[a ^ Q] \ rec x.P 



where a G Act, x G X,P,Q G P. 
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According to the above definition, it is clear that P is a proper subset of V* , 
i.e. V C V*. 

In order to define the semantics of V* , we need the following definition. 



Definition 2. Let yj and ab he the minimal relations on V* which satisfy the 
following rules, respectively: 



4(s) 4 (si)a4(s2) 


i/(s) ab(si) A ab(s 2 ) ab(s) 


\/{nil) \/{*s) "\/(®i IA ^ 2 ) 

y(recx.s) y(si;s 2 ) 

i/(s[a ^ Q]) where Q £P 


ab(s) ab(a) ab(si;s 2 ) ab(reca:.s) 

ab(x) ab(si -j- S 2 ) ab(s[o: Q]) 

ab(si A S 2 ) where Q eP 



Definition of 1 / Definition of ab 



Note that in the above definition, \/(s) means that s is terminated, whereas 
ab{s) means that s is either in V, or terminated. A state s is called abstract if 
ab(s), otherwise, called concrete. 

Besides complying with the three well-formedness conditions for V, V* also 
follows the below well-formedness condition: 



(iv) At least one of the operands of ||yi is abstract. 

An operational semantics of V* is given by the following transition rules: 



Act a A nil 



Seq-1 

Ref-1 

Rec 



a / 

Si Si 

Si; S2 — >■ S^; S2 



s[a ^ Q] A s'[a ^ Q] 
P[rec x.P/x] A s 
rec x.P A s 



a b 



Nd 

Seq-2 

Ref-2 

Star 



7 -» < 3 - 

P ^ s 

P + Q s and Q + P s 
A(si) and S 2 A 4 

a f 
Si; S2 — >■ $2 

s ^ s and Q ^ s\ 

s[a ^ Q] A (*si); s'[a Q] 

s A- s' 

a f 
*5 — >• *5 



A-Syn 

Syn 



Si lU S 2 A 4 lU S 2 and S 2 |U si A S 2 |U si 

a f 1 a. / 

Si — )• Si and S2 S2 

Si lU S2 A 4 lU 4 and S2 |U si A 4 lU si 



a A A ab(s2) 
a €: A A ab(si) A ab(s2) 



We’d like to comment on some special rules as follows: The rule Nd says 
that only two processes in P can be performed nondeterministically, the other 
cases are impossible by the definition of P*. The rule Ref-2 states that the 
residual Si of Q is non-interruptible. The rule Star says that *s behaves like s, 
but the reached state is still concrete (if not properly terminated). The rule A- 
Syn gives priority to the concrete component. At any time, if a concrete process 
is in parallel with an abstract process, the latter has to remain idle till the 
former finishes the executing. Observe that there is no way to reach a state 
where both components are concrete, starting from an initial abstract state (in 
fact, such a state would not be well- formed) . Moreover, if both components are 
abstract, the rule allows any of them to proceed first. The rule Syn states that 
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only two abstract processes can communicate each other. The communication 
between a concrete process and another process may destroy the atomicity of 
the refinement. In fact, it is impossible to reach a state where a concrete process 
synchronizes with another process from an initial abstract state. The other rules 
can be conceived as usual. The above rules guarantee that the execution of the 
refinement Q is not only to be non-interruptible, but also to be either executed 
completely, or not at all. 

In the following, we investigate the notion of strong bisimulation on V*. 

Definition 3 . — A binary symmetric relation R over the closed terms of V* 

is a strong bisimulation if for any (si, S2) G R 

• V(si) *if \/(«2); and 

• for any a G Act, si — >■ s'^, there exists s'2 s.t. S2 — >■ s'2 and (s^, S2) G R. 

— Si and S2 are strong bisimilar, denoted by si ~ S2, if and only if there exists 
a strong bisimulation R such that (si,S2) G R. 

— Let E,F € V* and fn{E) U fn{E) C {a;i, . . . , x„}. Then E ^ F iff for any 
closed terms si, . . . , s„, E{si/xi, ~ F{si/a;i, . . . , s„/x„}. 

According to the above semantics, it is easy to show that 

Lemma 1 . For any closed term s G V* , s ~ *s. 

Because a concrete process has a priority in parallel with an abstract process, 
~ is not preserved by m. For example, ai;u2 ~ a[a ai;a2], but (ai;a2) ||{} 
b 7^ a[a oi; 02] ||{} b. However, once we strengthen Definition 3 by adding the 
following condition: 

• ab(si) iff ab(s2), 

then the resulting largest bisimulation, denoted by ~ab, is a congruence relation 
over P*. Besides, obviously, ~ab is a proper subset of That is. 

Theorem 1 . ~ab is a congruence over V* and ^abC~. 

Convention: From now on, we use P,Q, . . . to stand for processes in P*. 

3 Fixpoint Logic with Chop (FLC) 

FLC is an extension of the modal ^-calculus by introducing the chop operator 
which can express non-regular properties [16]. It is therefore strictly more 
powerful than the ^-calculus, since [7,9] proved that only regular properties can 
be defined in the ^.-calculus. For our purpose, we modify FLC [16] slightly. 

Let X,Y, Z, . . . range over an infinite set Var of variables, true and false 
be two propositional constants as usual, and y/ be another one that is used to 
indicate if a process is terminated. 

The formulae of FLC are generated according to the following grammar: 

(f) ::= true \ false ] \/ ] t ] A ] [a] ] (a) ] A (/>2 ] </)i V 02 | ^ 1 ; 02 
j frX.(j) \ vX.<f 

where X & Var and a G Act^ . 



^ In [16], r is called term. 
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In the sequel, we use to stand for (a) or [a], p for true, false or and cr 
for v or fi, Act{(f>) for all actions that occur in <f>. 

As in the modal y^-calculus, the two fixpoint operators pX and vX bind 
the respective variable X and we will apply the usual terminology of free and 
hound occurrences of a variable in a formula, closed and open formulae etc. fn{4>) 
denotes the variables that have some free occurrence in (j) and &/(</') stands for 
the variables that have some bound occurrence in (f>. X is said guarded in (j) if 
each occurrence of A is in a sub-formula (j) preceded with or p. If all variables 
in (j) are guarded, then (f> is called guarded. 

FLC is interpreted over a given labelled transition system T = (5,A, — >•), 
where S Q V*, A C Act, and — >-C S x A x S. A formula is interpreted as a 
monotonic predicate transformer, which is simply a mapping / : 2“^ — >■ 2“^ that 
is monotonic w.r.t. the inclusion ordering on 2^ . We use MPTt to represent 
all these monotonic predicate transformers over S. MPTt, together with the 
inclusion ordering defined by 

/ C /' iff f{A) C f'{A) for all ACS, 



forms a complete lattice. We denote the join and meet operators by U and □. 
By Tarski-Knaster Theorem, the least and greatest fixed points of monotonic 
functions: (2*^ — >■ 2“^) — >■ (2*^ — >■ 2“^) exist. They are used to interpret fixed point 
formulae of FLC. 

The meaning of true and false are interpreted in the standard way, i.e. 
by S and 0 respectively. The meaning of is to map any subset of S to the 
subset of S which consists of all terminated processes in S. Therefore, a process 
P meets iff t is interpreted as an identity. Because nil and <5 have 

different behaviour in the presence of ;, they should be distinguished by FLC. 
To this end, [o] is interpreted as a function that maps a set of processes A to the 
set in which each process is not terminated and any of the a-successors of the 
process must be in A. This is different from its original interpretation in [16]. 
Therefore, according to our interpretation, P ^ [a] only if -<y/{P). Whereas in 
[16], it is always valid that P ^ [a] for any P €V*. So, it is easy to show that 
nil ^ Aoe. 4 ciHi while AosActHi foi^se is the characteristic formula of <5. 

The meaning of variables is given by an environment p : var — >■ (2“^ — >■ 2“^) that 
assigns variables to monotonic functions of sets to sets. p[X ^ f] agrees with p 
except for associating / with X. 



Definition 4. Formally, given a labelled transition system T = (5,A, — >■), the 
meaning of a formula 4>, denoted by C^{(j)), is inductively defined as follows: 



C!^{true){A) 
false) (A) 

C!f{r){A) 

C^{X) 

CU[a]){A) 



S 



{P\PGSA^{P)} 

A 



pW 

{P 1 W(^) AVP' : P A P' ^ P' G A} 
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C^i{a)){A) 

C^{4>1 A </>2)(-4) 

c^{4)i V </>2)(aI) 
C^{iyX.<P) 



{P I 3P' : P 4 P' A P' G A} 

C44)(4)nc4^2)(4) 

C44)(4)uC4^2)(4) 

■ ^t(4) 

□{/gmpTt I c/} 

u{/gmpTt I 2/} 



where A C S, and ■ stands for the composition operator over functions. 



A process P is said to satisfy (/) iff P G C!f{(j)){S) for some environment p, 
denoted hy P \= 4>. 4> ^ if means that Cff{4>){A) C C^(f>)(A) for any T and 
A C S and any p. (j) ip means {(p ^ ip) A {ip ^ (p) . The other notations can 
be defined in a standard way. 

Convention: In the sequel, we assume the binding precedence among the op- 
erators of the logic as follows: ” > “ V ” = “ A ” > ‘PA.” = “/xA.” >“=>” = 

“ ” . 

Many properties of FLC have been shown in [16], e.g., FLC is strictly more 
expressive than the ^-calculus since context-free processes can be characterized 
by it; FLC is decidable for finite-state processes, undecidable for context-free 
processes; the satisfiability and validity of FLC are undecidable; FLC does not 
enjoy the finite-model property and so on. 

[11] proved that FLC has the tree model property^, i.e.. 

Theorem 2. Given P,Q £ V* , and P ^ Q, then for any closed (p, P \= (p iff 
Q\= (p. 

Given a formula (p, we define its beginning atomic sub-formulae, denoted by 

FSub(</>)) as: 



FSub(4 A 



'W 


li (p = 


p, A, g or 


T 


FSub(4) U FSub((;i 2 ) 


\i (p = 


- (pi A<p 2 or 


II 

4 


ES\xh{(pi) 


li (p = 


(pi ; (p 2 and 


(piiAr 


FSub(4) 


ii (p = 


(pi ; (p 2 and 


(pl<^T 


^ FSub(^i) 


ii (p = 


aX.cpi 





Symmetrically, we define its ending atomic sub- formulae, denoted by ESub (</>), 
as: 



ESub(</>) A 





ii (p = 


p, A, [y] or T 


ESub(</>i) U ESub(4) 


ii (p = 


= (pi A (p2 or (p = pi 


ESub((/)i) 


ii <p = 


pi] p2 and p2 T 


ESub(</> 2 ) 


ii <p = 


pi] p 2 and p2 pA T 


ESub(</>i) 


ii <p = 


aX.pi 



Example 1. FSub((a); (6) A [c]; (e); [/]) = {(a), [c]}, whereas ESub((a); (6) A 
[c] ;(e);[/]) = {(6),[/]}. H 

The proof for the tree model property of FLC in [11] still works in our case. 



2 
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When we say that ^ only occurs at the end of 4> it means that can only be 
in ESub(0) as a sub-formula of (j) and can not appear elsewhere in the formula. 

Definition 5. A formula <f> is called existential formula if for any a € Act, [a] ^ 
FSub(</>). We use £T to stand for the set of existential formulae. Dually, a 
formula (p is called universal formula if for any a € Act, (a) ^ FSub((/)). We use 
UT to stand for the set of universal formulae. For technical reasons, we stipulate 
that T ^ UiF. A formula <f> is called a property formula if 4> pi A p 2 , where 
pi € £T and p 2 € UT . The set of property formulae is denoted by VtF. 

For £T and UT, we have 

Theorem 3. £T and UT are closed under all operators of the logic. I.e., popip€ 
£T(UT) and aX.p G £T{UT) if p,tp G £T{UT), for any p and ip, where 
opG {V,A,;}. 

4 Towards Hierarchical Specification 

As the complexity of reactive system designs becomes overwhelming very quickly, 
methods which allow to develop designs in a hierarchical fashion must be sup- 
ported by the design formalisms employed. Such methods allow to develop a 
design at different levels of abstraction thereby making the development proce- 
dure more transparent and thus tractable: Most likely, a developer first divides 
the intended (complex) design into various “sub-designs” to capture the ab- 
stract overall structure of the complete design. Subsequently, the sub-designs 
will be developed by enriching them step by step with details. This is the design 
technique usually encountered in practice, see e.g. in [18]. In process algebraic 
settings, action refinement as introduced in Section 2 supports the hierarchical 
design. 

In [15], we investigated the issue how to provide such a technique in a logical 
framework. To this end, a refinement mapping is defined by substituting the 
property of the refinement of an abstract action a for the modalities (a) and [a] 
in a high-level specification and producing a lower-level specification. However, 
in [15], we only consider the case when all specifications are represented by 
some formulae in the subset JVT of FLC called normal form formulae, which 
essentially correspond to the /x-calculus with r, and the properties of refinements 
by some formulae without fixpoint operator in the subset. This is because in 
[15] we concentrated on the simple case to refine an abstract action by a finite 
process. Here, we consider the issue in general, i.e., refining an abstract action 
by any process. To this end, we adopt FLC itself as specification language, 
instead of AfT. This is because after refining a formula in AfT with a property 
for a recursive process, the resulting formula may not be in AfT any more. For 
example, suppose that {a)p G AfT and a is refined by a process with the property 
vX.{a')X A (c'). By our definition, the refined specification is {uX.{a')X A {c'))p. 
It is easy to prove that there exists no (p G AfT such that ip is equivalent to the 
specification. 
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In a logical framework, actions are addressed as modalities and descriptions 
of systems are represented by formulae. In most of modal logics, there are two 
kinds of modalities, i.e. (a) and [a] which are used to express existential and uni- 
versal properties respectively. As discussed in [15], a refinement mapping should 
be property-preserving, i.e. an existential property should be refined to an exis- 
tential property and similarly for the other properties. Otherwise, the mapping 
is meaningless since it’s impossible to establish a correspondence between action 
refinement for models and action refinement for specifications. For example, 
P=a; b + a;c\= (a); (6), Oi; 02 |= [flij; ( 02 ), but P[a ai, 02 ] ^ ([oi]; ( 02 )); (&), 
since in the high-level specification, (a); (6) is an existential property, however 
its refinement becomes a universal property. 

To ensure that the mapping is property-preserving, we partition the prop- 
erty 'ip of the refinement of a into two parts: an existential property i/'i and an 
universal property ^/>2 i.e. ip G PP- [a] will be replaced by ip 2 , and (a) will be 
replaced by ipi. This is justified by the result shown in [4] that any property can 
be presented as the intersection of a liveness property and a safety property in 
branching temporal logics. So, PP is powerful enough to define the properties 
of reactive systems. 

Therefore, we define the refinement mapping as follows: 

Definition 6. Suppose 4> is a high-level specification, a is an abstract action 
to be refined, and ip=ipi A ^’2 G PP is the description of the refinement of a, 
where ipi G £P and ip 2 G UP. We define the refinement mapping, denoted by 
Ip, a), as follows: 

n{(p,ip,a) = (p{ipi{T / y/} / {a) ,ip 2 {r / y/} /[a]} , 

where <p{ip/x} fneans to substitute ip for each occurrence of \ in </>, with \ G 
{X, y/, (a), [a]}. 

According to the above definition, it is easy to get the following results. 
Lemma 2. Suppose X does not occur in ip. Then 

^{(pi{(p 2 /X},i’,a) ^ f2{(pi,ip,a){f2{(p2,fi’,a)/X}. 

Lemma 3. (1) If (p cp' then n{cp,ip,a) n{(p' ,ip,a); 

(2) If Ip ^ Ip' and y/ only occurs at the ends of ip and ip', then 12{(p,ip,a) 

n{4>,ip',a). 

Theorem 4 (Applicability). If (p G FLC and ip G PP, then Q{<p,ip,a) G 
FLC; If(p,ipG PP, then n\(p, ip, a) G PP. 

Here, we further study the example of a salesman that is firstly presented in 
[6] and has been investigated in [15] to demonstrate how to employ our approach 
to hierarchically specify a complex systems. 

Example 2. Suppose that a salesman has to go by car from his office in Paris to 
another office in London and work there for some time, and then has to go back 
to Paris repeatedly. He takes a hovercraft to cross the Channel. 
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So, the top-most specification of the system can be represented as follows: 

, ^ I (leave_Paris); [fr_thr_Channel]; (arrive Jn_London); (work); 

^ ' \^ (leaveJjOndon); [gb_thr .Channel]; (arrive Jn_Paris); X 

where the actions “work” and “x_thr_Channel” will be refined subsequently. 

The job of the salesman in London is to contact repeatedly some of his 
customers by phone, or to meet some of them in his office to discuss something. 
Therefore, we can refine the action “work” by a process that meets the following 
property: 

'(pi = :yX.((contact_Customers) V (meet.Customers)); X A (finish.Work) . 

Meanwhile, we can describe “x_thr .Channel” in more detail. There are two 
platforms lying on the two sides of the Channel respectively that take charge 
of the hovercraft. At the beginning, one of them loads the salesman’s car, then 
arranges the hovercraft to depart. Then the hovercraft crosses through the Chan- 
nel. After the hovercraft arrives at the opposite side, the other platform unloads 
the car. Hence, “x.thr.Channel” can be enriched as follows: 

■i/:a;A[x.load]; [x.departure] ; (cross .Channel); (x.arrival); (x.unload) A true. 

Furthermore, we can refine “x.departure” by a process with the property 

V’2=[finish.loading]; (engine.on); (bye.bye) A true, 

where finishdoading signals the end of loading, and cross.Channel by a process 
with the property 

tp3 = true A (sit .down); 

(i/X. ((newspaper) V (tea) V (coffee)); X A (keep.idle)); (stand.up). 

So, the specification for the final system can be represented by 

i/)2) ^.departure), V'a, cross.Channel) , x.thr.Channel) , t/’i , work) , 

where x G {fr,gb}, and if x = fr then x = gb else x = fr. 

It is obvious that we can not refine “work” and “cross.Channel” by some 
processes that satisfy xpi and V’3 respectively in [15] because on the one hand, 
the resulting specification is no longer in AftF; on the other hand, “work” and 
“cross.Channel” both are needed to be refined by some processes with possibly 
infinite behaviour. H 



5 Relating Hierarchical Specification 

to Hierarchical Implementation of a Large System 

In this section, we establish a correspondence presented by the Refinement The- 
orem below between hierarchical specification and hierarchical implementation 
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of a complex system. It states that ii Q \= xj}-, ^ , P \= (j) and some syntactical 
conditions hold, then P[a Q] H V') «)• This result supports ‘a priori’ 
verification. In the development process we start with P \= (p and either refine P 
and obtain automatically a (relevant) formula that is satisfied by P[a ^ Q]. Or, 
we refine p using f2{p, p, a) and obtain automatically a refined process P[a ^ Q] 
that satisfies the refined specification. Of course such refinement steps may be 
iterated. 

In order to ensure the Refinement Theorem is valid, the following syntactical 
conditions are necessary: 

Above of all, it is required that {Act{P) U Act{p)) O {Act{Q) U Act{p)) = 0, 
because of the following considerations: 

(i) As far as action refinement for models is concerned, no deadlock will be 
introduced or destroyed; 

(ii) no unsatisfaction between P[a Q] and fi{p, p, a) will be caused because 
p involves Q. For instance, let P=a]b, p=[a\]{b) A [c];(d), Q=c]e and 
p=[c\] (e). It is obvious that P \= p and Q \= p]y/, but P[a Q] ^ 
n{p,p,a); 

(iii) Symmetrically, no unsatisfaction between P[a Q] and f2{p,p,a) will 
be caused because p involves P. For example, let P=a\ b+b\a, p=[a]-, (b), 
Q=c; e and p=[c\] (e) A [&]; (d). It is obvious that P \= p and Q \= p]y/, 
but P[a ^ Q] ^ ^{P, P, a)- 

It is clear that this condition can guarantee the above three requirements. 

Besides, it’s possible that p only describes partial executions of Q, so the 
refined specification may not be satisfied by the refined system. For example, 
it’s obvious that a;b + a;c |= (a); (6) and ai;o 2 ^ (oi)) but {a;b + a;c)[a 
Qi; 02 ] ^ (oi); {b)- In order to solve such a problem, we require that p describes 
full executions of Q, i.e., Q \= p; y/. Normally, we only consider to refine an 
abstract action a by a normed process Q, i.e., for any derivative Q' of Q, Q' may 
terminate in finite steps. If so, for any given Q and p G PT with Q \= p, the 
above requirement can be satisfied by constructing ip as p] idX-{\/ aeAct (a));AV 
r) instead of p. It is clear that p G VP, P \= p iS P \= ip for each P gP* , and 
Q \= P',y/. Therefore, in most cases, the above constraint does not give rise to 
any restriction to the applications of the theorem. 

Finally, it is possible that \/ as a sub-formula of p makes the sub- formulae 
following it with ; no sense during calculating the meaning of p, but the sub- 
formulae play a nontrivial role during interpreting f2(p,p,a). E.g. a';nil ^ 
(o'); y/; [o']; (6') and a;a;c \= (a); (a); (c), but 

(a; a; c)[a o'; nil] ^ ((o'); r; [a']; {b')); {{a'); t; [o']; {b')); (c). 

So, we require that y/ only can appear at the end of as a sub-formula. In 
fact, such a requirement is reasonable because all formulae can be transformed 
to such kind of forms equivalently because p]pA^p. 

Now, we can represent our Refinement Theorem as follows: 
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Theorem 5 (Refinement Theorem). 

If {Act{P) U Act{(j))) n {Act{ip) U Act{Q)) = %, Q \= and P \= (f>, then 
P[a'^ Q] ^ where ip G VP and y/ only occurs at the end of ip. 

In order to demonstrate how to apply the Refinement Theorem to verify a 
complex system hierarchically, we continue Example 2 . 

Example 3. As explained in Example 2 , at the top level, we can implement the 
system as: 

Sys A fr.Channel ||{fr_thr_Channei} Salesman ||{gb_thr_Channei} gb.Channel. 
Where x_Channel=recj/.x_thr_Channel; j/, and 

Salesman A rec a:. leave _Paris; fr_thr .Channel; arrive Jn_London; 

work; leaveJl/Ondon; gb_thr_Channel; arrive _in_Paris; x. 

It’s obvious that Sys \= (p. 

Then, we can refine “work” by Subsysi which is defined by 

Subsysi A rec a;.((contact_Customers + meet -Customers); a; + finish.Work). 

It’s obvious that Subsysi \= ipi]^/. 

Then, “x_thr_Channel” can be implemented by 
SubsySx=x_load 1 1 {x joad} Channel, 

where Channel A fr .Platform ||{fr_arrivai,fr_departure} Hovercraft 

II {gb_arrival,gb_departure} gb_Platform, 

where Hovercraft = fr_departure;cross.Channel; gb.arrival + 
gb_departure;cross_Channel; fr .arrival, 
x_Platform A x_load;x.departure + x.arrival;x_unload. 

It’s easy to show that SubsySx |= ipx] \/ ■ 

Furthermore, we can refine “x.departure” by Subsys2 and “cross.Channel” 
by Subsyss, where, 

Subsys2 = finish Joading; engine _on; bye.bye, 

Subsysg Asit.down; reca; .([(coffee + tea)| |{}newspaper[; a:+keepJdle); stand.up. 

Certainly, Subsys2 \= and Subsysg \= ip^]^J. 

The final system is obtained as: 



Sys [work Subsysi , 

x.thr -Channel ^ Subsys, 



. x.departure ^ Subsys2 , ,, 
cross.Channel Subsysg ’ 



where x G {fr, gb}. 

According to the Refinement Theorem, the final system satisfies the final 
specification. H 
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6 Concluding Remarks 

In this paper, we extend our previous work on combining hierarchical specifica- 
tion with hierarchical implementation of a complex system by allowing to refine 
an abstract action by an arbitrary process. Technically, we also greatly simplify 
our previous work such that our method can be more easily applied in practice. 
Furthermore, we also establish a correspondence between hierarchical specifi- 
cation and hierarchical implementation that supports ‘a priori’ verification in 
system design. 

Similar results are shown in [10,14], but in their approaches, a refined specifi- 
cation is obtained from the original specification and the refinement Q, where Q 
is a finite process. Therefore, besides sharing the restriction of our previous work 
[15], certain interesting expected properties of the refined system cannot be de- 
rived using their approaches. What’s more, we can show that their approaches 
can be seen as a special case of our method from a specification-constructing 
point of view. [2] discussed composing, refining specifications of reactive sys- 
tems as some sound rules of a logic. [1] considered the problem given a low-level 
specification and a higher-level specification, how to construct a mapping from 
the former to the latter in order to guarantee the former implements the latter. 
Our refinement mapping f? maps the abstract specification to the lower-level 
specification, i.e. we go the converse direction. 

In our framework, composing specifications also can be dealt with, for exam- 
ple, supposing P \= 4>;y/, and only occurs at the end of 4> and Q ]=?/>, we can 
get a composite specification like ip for the combined system P; Q. 

In this paper, we use the standard interleaving setting, so we only consider 
the case of atomic action refinement for models because the standard bisimu- 
lation notion is not preserved by non-atomic action refinement in this setting. 
In fact, we believe our approach may be applied to the case of non-atomic ac- 
tion refinement, too, if an appropriate logic which is interpreted over some true 
concurrent settings such as event-structures is available. But it is still an open 
question how to establish such kind of logics. 
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Abstract. Using recent results on integrating induction schemes into 
decidable theories, a method for generating lemmas useful for reasoning 
about T-based function definitions is proposed. The method relies on 
terms in a decidable theory admitting a (finite set of) canonical form 
scheme(s) and ability to solve parametric equations relating two canoni- 
cal form schemes with parameters. Using nontrivial examples, it is shown 
how the method can be used to automatically generate many simple 
lemmas; these lemmas are likely to be found useful in automatically 
proving other nontrivial properties of T-based functions, thus unbur- 
dening the user of having to provide many simple intermediate lemmas. 
During the formalization of a problem, after a user inputs T-based def- 
initions, the method can be employed in the background to explore a 
search space of possible conjectures which can be attempted, thus build- 
ing a library of lemmas as well as false conjectures. This investigation 
was motivated by our attempts to automatically generate lemmas aris- 
ing in proofs of generic, arbitrary data-width parameterized arithmetic 
circuits. The scope of applicability of the proposed method is broader, 
however, including generating proofs for proof-carrying codes, certifica- 
tion of proof-carrying code as well as in reasoning about distributed 
computation algorithms. 



1 Introduction 

A major challenge in automating proofs of inductive properties is the need to gen- 
erate intermediate lemmas typically needed in such proofs [2,3,15]. Often, many 
lemmas needed in such proofs are simple properties of functions appearing in 
an original conjecture and they can be easily proved once they can be properly 
formulated. Inability to speculate intermediate lemmas is one of the reasons, 
we suspect, for induction theorem provers not being used by domain experts in 
their applications including verification of hardware circuits and distributed al- 
gorithms, certification of proof-carrying codes, analysis of design specifications, 

* This research was partially supported by an NSF ITR award CCR-0113611. 
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etc. For a novice user of an induction theorem prover, such intermediate lemmas 
are hard to formulate, since failed proof attempts of the original application spe- 
cific conjecture have to be analyzed manually. In order to do this successfully, 
the application expert has to become an expert in understanding the internal 
representation of proofs, how proofs are generated by theorem provers, and var- 
ious heuristics employed to generate such proofs. And, this can be tedious and 
time consuming. This demand can become an added burden on the application 
user since most such properties are a result of a particular formalization being 
attempted, and may have little to do directly with the application. 

The use of theorem provers can be made more effective if simple and seem- 
ingly obvious properties of functions employed in the formalization of application 
specific properties can be automatically attempted. This can provide immediate 
feedback to the user about the functions used in the formalization thus either 
leading to fixes in the formalization or enhancing confidence in the formalization. 

In this paper, a method is proposed using decision procedures for speculating 
and proving “simple” properties of functions likely to be found useful soon after 
the function definitions are introduced. The proposed approach relies on the 
structure of T-based function definitions introduced in [11] and properties of 
decision procedures for decidable theories 'T. Below, we briefly discuss the main 
ideas underlying the proposed approach. 

This line of research was motivated by our attempts to automatically generate 
lemmas arising in proofs of generic, arbitrary parameterized arithmetic circuits. 
The scope of applicability of the proposed method is broader, however, including 
generating proofs for proof-carrying codes, certification of proof-carrying code 
as well as in reasoning about distributed computation algorithms. 



1.1 Overview 

Given a definition of /, the main idea is to hypothesize a conjecture of the form 
/(xi, • • • , Xfc) = r, where r is a unknown (parameterized) term in a decidable 
theory T possibly involving x\, - ■ ■ ,Xk as well as parameters pi's whose values 
need to be determined for all possible values of ccj’s. In fact, r is one of the many 
possible canonical forms of a term in 'T. As shown in [11] , such a conjecture can be 
decided if / is T-based. A proof is attempted of this conjecture, possibly leading 
to instantiation of various parameters. If all the parameters can be consistently 
instantiated, then the right hand side r of the conjecture can be determined, 
thus implying that / is expressible in T. 

For example, consider the function cost denoting a recurrence relation for a 
divide and conquer algorithm. 

1. cost{0) — )■ 0, 2. cost{l) — >■ 2, 

3. cost{s(x)-\-s(x))^cost{x)^cost(x)-\-4:. 4. cost(5(5(x) + 5(a:)))^cost(T)+cost(x)+6. 

As discussed later (also see [11]) in section 2, the above definition can be shown to 
be based in the quantifier-free theory of Presburger arithmetic, denoted by PA^. 

^ The theory PA includes 0, s, -I-, =; the symbols 1,2,3, etc., stand for s(0), s(s(0)), 
s(s(s(0))), respectively. 
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It may be speculated that cost{m) is expressible in PA and is thus equivalent to 
a term in PA, say k\ m + with fci , ^2 as its parameters (without any loss of 
generality); k\ m stands for m repeated ki times. So we generate a parametric 
conjecture PI : cost{m) = ki m + k 2 with the goal of finding ki, ^ 2 - 

An induction proof using the cover set method [16] using the definition of 
cost is attempted. The first subgoal generated by unifying the left side of rule 1 
with the left side of PI with substitution {m — 1 0} is cost{0) = ki 0 + ^ 2 , which 
simplifies using rule 1 to the constraint si : k 2 = 0. The second subgoal generated 
by unifying the left side of rule 2 with the left side of PI with substitution 
{to — 1 1} is cost{l) = ki + k 2 , which simplifies using rule 2 to the constraint 
s2 : fci + /c 2 = 2. In the third subgoal, the conclusion generated by unifying the 
left side of rule 3 with the left side of PI with substitution (to — >■ s{x) + s(x)} 
is cost(s(x) + s(x)) = k\ x+ki + ki x + k\ + k 2 , which simplifies using rule 3 to 
cost{x) +cost(x) +4 = k\ x+ki + ki x + k\ + k 2 , Replacing cost{x) by k\ x + k 2 , 
one gets fci x + k 2 + k\ a; + ^2 + 4 = fci x + k\ + k\ x + k\ + k 2 which further 
simplifies to s3 : /c 2 + 4 = fci + . Similarly, from the fourth subgoal generated 

using rule 4, we get s4 : /c 2 + 6 = + /ci + /ci. 

Solving for the constraints si, s2, s3, s4 (using a decision procedure for PA) 
gives k\ = 2, ^2 = 0; thus, cost{m) = 2 m, implying that cost is expressible inPA^. 

Typically, it will be not be possible to express a recursively defined function 
in T since new recursive definitions are often introduced because the functions 
being defined cannot be expressed in T. (If a defined function can be expressed 
in T, that is more likely to be an error in the definition than the user’s improper 
grasp of the expressive power of the theory.) What is more likely is that in gen- 
eral, parameters cannot be consistently instantiated; instead, for some specific 
values of Xi’s, parameter values can be obtained, thus implying that for those 
values of Xi’s, the instantiated conjecture equals some term in P. 

For example, consider the following definition of c2plus: 

1. e2plus(0,0) ^ 1, 2. e2plus{0, s{y)) — > e2plus(0,y) + 1, 

3. e2plus{s{x),0)—>e2plus{x,0)+e2plus{x,0), 4. e2plus{s{x), s{y)) s{e2plus{s{x),y)). 

Similar to the previous example, it may be speculated that e2plus can be ex- 
pressed in PA and a conjecture P2 : e2plus{x,y) = k\ x + k 2 y + k^ may be 
hypothesized with the goal of finding k\,k 2 ,k^. 

To solve for the parameters, constraints are generated from the rules 1-4. 
The first constraint si : /cs = 1 comes from rule 1. The second constraint is 
generated from rule 2 by replacing the left side and the recursive call with ki 0 + 
^2 s{y) + ks and ki 0 + k 2 y + k^. It simplifies to s2 : k 2 = 1. The third and 
fourth constraints are generated in a similar fashion from rules 3 and 4, and 
they simplify to s3 : fci = fci a; -I- k^ and s4 : /c 2 = 1- These constraints are 
not satisfiable for all values of x and y, primarily because of s3 which restricts 
ki = 0 and hence fca = 0, whereas the constraint si gives k^ = 1. So, there is no 

^ In almost all cases, it is possible to use the right side of the conjecture in the 
parametric form to substitute in each of the rules in the definitions and directly get 
a constraint. The cover set induction method is being used just as a general purpose 
mechanism to do the same. 
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solution for all values of x, and hence e2plus cannot be expressed in PA. As the 
reader would notice, there are two kinds of constraints: (i) constraints purely in 
terms of parameters, and (ii) constraints involving both parameters and variables 
(e.g. s3). 

Now conjectures with proper instances of e2plus{x,y) as their left sides can 
be speculated. In order to attempt to generate most general conjectures, max- 
imal consistent subsets of the set of inconsistent constraints {si, s2, s3, s4} are 
identified (see subsection 3.2 for details). One such maximal consistent subset 
of constraints is |sl,s2, s4} giving ^2 = 1,^:3 = 1 with ki being unconstrained. 
For these values of parameters, constraint s3 can be solved for specific values of 
variables. Particularly, if a; = 0, then s3 reduces to s3.1: k\ = fcs, which can be 
consistently added to |sl,s2,s4}^. The rules associated with the instances are 
computing by splitting the rule using the variable substitution (see subsection 

2.3 for details), e.g:, rule 3 is split by {x — >■ 0} into two rules, 

3.1 e2p/Ms(s(0), 0) — >■ 2, 

3.2 e2plus{s{s{x)),0) — >■ e2plus{x,0) + e2plus(x,0) -|- e2plus{x,0) + e2plus{x,0). 
The constraints for rule 3.1 is s3.1' : ki + ks = 2 and rule 3.2 is s3.2 : 2ki = 

3 ki X + 3/C3. 

The rules to be considered for speculating a conjecture are (1, 2, 3.1, 4}. From 
the left side of these rules, a possible conjecture with e2plus{0, y) as the left side 
is generated (since the left sides of rules 1 and 2 are e2plus{0, 0),e2plus{0, s{y))) 
(see subsection 2.4 for details). 

The rules {1, 2, 3.1, 3.2, 4} are analyzed to pick their instances which are likely 
to be used to compute e2plus{0, x). Rules 1 and 2 constitute such a complete set 
since any ground instance of e2plus{0, x) can be normalized to a PA-term using 
them (see subsection 3.3 for details). The constraints generated from rules 1 and 
2 have the solution |A :2 = 1,^3 = 1 }, which generates the lemma e2plus{0,x) = 
s{x). We make it rule 5 and add it to the definition of e2plus. 

5. e2plus{0,y) ^ s{y). 

This lemma can replace rules 1 and 2, and can be used to speculate additional 
lemmas from rules 3, 4 and 5 which completely define e2plus. 

Rules associated with the maximal set (1, 2, 3.1, 4} may also be expanded to 
generate additional rules for generating left sides for conjectures (see subsection 

3.3 for details). The expansion is done by choosing a rule whose right side is a 
PA-term, e.g: rule 3.1, and splitting by unifying the recursive calls in other rules 
with the left side of rule 3.1. The unification of the recursive call e2plus{s{x),y) 
in rule 4 with e2plus{s{Q),Q) of rule 3.1 produces four equivalent rules two of 
which are: 

4.1 e2pZMs(s(0), s(0)) — >■ 3, 4.2 e2plus{s{A) , s{s{y))) — >■ s{s{e2plus{s(Q),y))). 

The conjecture with the left side e2plus{s{Q),y) can now be hypothesized by 
abstracting from the left sides of rules 3.1, 4.1, and 4.2, which completely define 

® Other values of x produce instances with fcs = 0, which contradicts si and hence 
cannot be added to |sl,s2,s4}. 
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e2plus{s{0),y). The associated constraints, s3.1 : fci + fcs = 2, s4.1 : fci+fc 2 + fc 3 = 
3 and s4.2 : k 2 = I, are solved and give e2plus{s{0), y) — >■ y+2 as the new lemma. 

In order for the approach to be successful, a decidable theory T needs to 
have the following properties: 

1. The no-theory condition introduced in [6] for 'T must be decidable, i.e., 
whether a term involving defined function symbols with T-based definitions 
is equivalent to a T-term (with any defined symbols) is decidable. 

2. Given a finite set of equations relating parameterized terms in canonical form 
of T, their solvability can be decided and furthermore, if solvable, find all 
possible most general solutions for parameters for all values of variables. This 
problem is related to the unification problem over 'T. For example, in the case 
of the quantifier-free theory of Presburger arithmetic, given two parametric 
expressions, say k\ x\ + k 2 X 2 + k 2 X 2 + k^ = k^ x\ + k^ x\ + k 2 X 2 + k^ X 2 , 
where fco 7 fci; ^ 2 , ^ 3 , ^4 are parameters, the most general solution is k\ = 
ks + ks, ki = k2, ko = 0 . 

3. Terms in T have a small finite set of canonical forms (to keep the branching 
factor of the search space low); e.g., the quantifier-free theory of Presburger 
arithmetic has such a form SkiXi + ko, where Xi are the variables and ki 
are the parameters; similarly, the quantifier-free theory of free constructors 
admit also such canonical forms: a term in this theory is either a variable 
or one of Ci{ti,t 2 ), where Ci is a free constructor in the theory and ti,t 2 are 
parametric terms. 

The rest of the paper is organized as follows. Section 2 provides background and 
reviews the definition of a T”-based recursive definition, inductive validity, the 
cover set method, etc; more details can be found in [5,6]. Section 3 introduces the 
procedure for generating simple lemmas. There are four major building blocks: 

1. Given a conjecture of the form /(si, • • • , Sk) = r, where r is a parametric 
term over T, decide whether such an r can be found; if so, compute r. 
This operation is decidable for quantifier-free theories of free constructors 
and Presburger Arithmetic (PA); that is why the focus in this paper is on 
recursive function definitions over these two theories. 

2. Given a finite set of parametric equations over 'T, generate a finite represen- 
tation of all assignment of parameters which make the equation true for all 
values of variables; further, if an assignment of parameters cannot be found 
for which the equation is true for all values of variables, then generate a 
finite representation of values of variables and the corresponding assignment 
of parameters for which the equation is true. 

3. Given a set of inconsistent parametric constraints associated with a given rule 
set, generate maximal consistent subsets of constraints and the associated 
rule sets. 

4. Given a finite set of /c-tuples of terms from a decidable theory T, generate 
a preferably, more general and smaller set of fc-tuples of terms that has 
the same set of ground instances as the input. This construction is needed 
to generate the left sides of conjectures. For a given Ihs = /(si, • • • , s^). 
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where / has a T-based definition and each Si is a T-term, compute from the 
definition of /, a complete rule set for normalizing all ground instances of 
Ihs to a, T-term. 

Section 4 is a brief discussion of how the procedure of Section 3 works for a 
quantifier-free theory of free constructors that may admit multiple canonical 
forms. Section 5 gives an overview of the notion of compatibility among recursive 
function definitions and shows how the proposed method can be used to generate 
lemmas with multiple defined symbols on their left side. This is illustrated using 
examples of lemmas needed in proofs of verification of generic arbitrary data- 
width and parametric arithmetic circuits; more details about these experiments 
can be found in [9,10,12]. 

2 Background 

A brief overview of the relevant definitions and results from [11,5] is given first; 
see also [ 6 ] where some of the results from [11] have been tightened. The example 
theories considered in this paper are the quantifier-free theory of Presburger 
arithmetic and the quantifier-free theory of free constructors (natural numbers 
which are generated by 0 , s and finite lists which are generated by nil, cons are 
two examples of this generic theory). 

The framework of many-sorted first-order logic where “=” is the only pred- 
icate symbol is used below. For a set of function symbols T and an infinite set 
of variables V, we denote the set of (well-typed) terms over T by Terms{T ,V') 
and the set of ground terms by Terms {iF). Terms are represented as trees. Let 
root{t) stand for the function symbol at the root of a term t and V{t) denote the 
variables of t. Given a theory T, “[=” is the usual (semantic) first-order conse- 
quence relation; iFj- is the set of function symbols of T; Terms{T-r,V) denotes 
the terms of T. Terms in Terms {T-j-, V) are called T-terms. Below x* stands for 
a /c-tuple {xi, • • • , Xk)- 

Definition 1 (T-based Function [11]). A function f £ if is T-based iff all 
rules I ^ r £ TZ with root{l) = / have the form f{s*) -£■ C[f{tl),...,f{t’^)], 
where s* ,t\, . . . are k-tuples of T -terms and C is a context over Tq-- More- 
over, we assume that all terms f{t*) are in normal form. 

Let Df be the subset of rules I ^ r £ TZ whose root{l) = f. Df is called a T- 
based definition of a T-based function /. Henceforth, any T-based definition is 
assumed to be terminating, sufficiently complete and ground confluent modulo 
= 7 -, i.e., for every fc-tuple of ground T-terms g*, f{g*) normalizes in finitely 
many steps to a unique ground term (equivalent modulo =t) T the range sort 
of / using Df. 

Definition 2 (Cover Set). Given a T -based function definition Df of f, its 

cover set zs C/ = {(s*, {T, . . . , })] /(s*) ^ C[/(t];), . . . , /(t* )] G T/}. 

Using the cover set method, an induction proof of a conjecture q = r in which 
/ appears, leads to the following subgoals for every (s*, {tj;, . . . ,t^}) £ Cf, the 
cover set of /: 
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[ai{q = r) A . . . A an{q = r)] 9{q = r), (1) 

where 9 = {a;* — >■ s*}, and each cTj = {a;* — 1 t*}. If all formulas (1) are in- 
ductively valid, then by Noetherian induction, q = r is also inductively valid. 



Definition 3 (Simple Conjecture). A conjecture f{xi, • • • , Xk) = r, where f 
is T-hased and {xi, - ■ ■ ,Xk) are distinct variables, and r is a T-term, is called 
simple. 

For example, the function cost above is based in Presburger arithmetic; the 
parameterized conjecture PI about cost is a simple conjecture. 

In [II], it is shown: 

Theorem 1. The inductive validity of a simple conjecture f{x\,---,Xk) = r 
where f is a T-hased, can he decided based on its eover set. 

2.1 Quantifier- Flee Theory of Presburger Arithmetic (PA) 

Following [6], we use the following definition for the theory PA of Presburger 
Arithmetic. Tpa = {0, 1,+} and AXpa consists of the following formulas: 

{x y) z = X {y z) ^ (1 -I- a; = 0) 

X y = y X X -\- y = X -\- z ^ y = z 

0 + J/ = y a; = 0V 3y. x = y 1 

For PA-term t with V{t) = {a;i, . . . ,Xm}, there exist Oi G IM such that t =pa 
oo -I- ai • xi -I- . . . -I- Um ■ Xm- Here, “a • a:” denotes the term a; -I- ... -I- x (a times) 
and “oo” denotes 1 -I- ... -I- 1 (oq times). For s =p a bo bi ■ x\ bm ■ Xm 

and t as above, s =pa t iS oq = bo, . . . , = bm- 

When we conjecture a term t involving symbols outside PA to be expressible 
in PA, i.e., equivalent to some term in PA but we do not know precisely which 
term, we have to use a parametric term of the form SkiXi ko, where fcp, • • • , 
are unknown (hence, parameters). Let /C be a finite set of parameters, distinct 
and disjoint from the variables V. For t to be expressible in PA, there must exist 
values of fco, • • • , fcn such that t =pa TkiXi ko; otherwise, if t is not expressible 
in PA, then there cannot exist such values of ko, ■ ■ ■ ,kn- 

There are thus two kinds of terms: (i) a PA-term consisting of variables 
and functions in PA, whose canonical form is SaiXi -\- oq, where Xj G V and 
ao, ■ ■ ■ , are nonnegative numbers, and (ii) a parametric term consisting of 
variables, parameters, and functions in PA, whose canonical form is Aa^Xj -I- Oq, 
where Xj G V and a^, • • • , am are linear polynomials in parameters in /C. Since 
the second kind of terms subsume the first kind, by a term in PA below, we 
mean the second kind of term and call it a parametric term, in contrast to the 
first kind of terms, which we will refer to as a pure term. 

An (pure) equation in PA is Ao^Xj-l-ao = SbiXi~\-bo, where Oq,- • •,am,&oc ‘ G 
bm are nonnegative numbers. Using cancellativity, we can simplify the equation 
so that every variable appears only on one side. A canonical form of the equation 
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is SciXi + Co = 0, where cq, - ■ ■ ,Cm are integers (including negative numbers); 
this is an abbreviation for the equation where variables with positive coefficients 
are on one side of the equation and variables with the negative coefficients are 
on the other side. For example, 2x — 3y — 1 = 0 stands for 2x = 3y + 1. 

Since it is necessary to consider equations involving parametric terms as well, 
by a parametric equation, we mean SpiXi + po = where Xi € V and each of 
PO) • ■ ■ jPm is a linear polynomial in parameters in /C with integer coefficients. 
Again, since a parametric equation subsumes a pure equation, below we only 
refer to parametric equations. 

A parametric equation is valid if and only if it is true for all values of its 
variables and parameters. As an example, the parametric equation fci xi + k 2 X 2 
= ki x\ + k 2 X 2 is valid. 

A parametric equation is strongly satisfiable if and only if for all values of the 
variables, there exists parameter values that make equation true. As an example, 
the parametric equation (ki + k 2 ) Xi + k 2 X 2 + k^ - 1 = 0 is strongly satisfiable 
with parameter values ki = k 2 = 0 and k^ = 1. Similarly, (fci — ^ 2 ) a; = 0 
is strongly satisfiable with parameter values fci = /c 2 - A parametric equation 
SpiXi + po = 0 is strongly satisfiable if and only if there exist parameter values 
for which the conjunction of the equations po = 0, • • • ,pm = 0 is true. 

A parametric equation is weakly satisfiable if and only if there exist variables 
and parameter values that make the equation true. As an example, the paramet- 
ric equation (fci -|- /C 2 ) xi -I- fc 2 X 2 -I- — 1 = 0 is weakly satisfiable for variable 

values xi = X 2 = 0 and parameter values fci = 1 (^2 is indeterminate and can 
be any value). This equation is not strongly satisfiable since the conjunction 
of equations ki + k 2 = 0, k 2 = 0 and k\ — \ = Q cannot be satisfied for any 
parameter values fci and ^ 2 - 

A parametric equation is unsatisfiable if there do not exist any values of 
parameters and variables which make the equation true. As an example, the 
equation 2ki cc - 1 = 0 is unsatisfiable. 

Every strongly satisfiable parametric equation is also weakly satisfiable. How- 
ever, an equation may be weakly satisfiable under different sets of parameter val- 
ues than the parameter values under which the equation is strongly satisfiable. 
As an example, consider the equation, k\ = k\ x + k^. The equation is strongly 
satisfiable when k\ = k^ = 0. However the equation is also weakly satisfiable for 
X = 0 and k\ = k^ = 1 (in fact any value insofar as k\ = k^). 

A parametric equation that is weakly satisfiable but not strongly satisfiable is 
called weak parametric equation. Whether a given parametric equation SpiXi + 
Po = 0 is weak or not, can be easily checked by simplifying it to po = 0, • • • , Pm = 
0 and finding a solution for parameters. If there is a solution, then the equation 
is not weak; otherwise it is weak. 

A set S of parametric equations is consistent if and only if for all values of 
variables, there exist parameter values which make each equation in S true i.e., 
for all Xi there exists ki such that the conjunction of the equations in the set 
is true. If S includes a weak parametric equation, it cannot be consistent. So a 
consistent S must only include strongly satisfiable parametric equations. 
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A parametric constraint is defined to be a linear equation over parameters. If 
a set S of parametric equations is consistent, then there exists a set of parametric 
constraints equivalent to it obtained by projecting (elimination of variables). 

2.2 Quantifier- Free Theory of Free Constructors 

For the theory Tc of free constructors, AXj-^ consists of the following formulas. 

-ic{x*) = c'{y*) for all c, c' G iFrc where c c' 

c{xi,^x„) = c(yi,^yn)^xi^yi A ... Ax„ = y„ for all c G Xtc 
x = c{y*) 

-I (ci(. . . C 2 (. . . c „(. .. X = x) for all sequences ci, ..., c„, d G Ttc 

Note that the last type of axioms usually results in infinitely many formulas. 
Here, “. . . ” in the arguments of Ci stands for pairwise different variables. 

A term in Tc can be either a variable or of the form Cifti, ■ ■ ■ ,tk), where 
Ci is a constructor and t's are terms in Tc- As in PA, to check whether a 
term s involving symbols outside Tc is expressible in Tc, we hypothesize s to 
be equivalent to some term in Tc without knowing exactly which one, i.e., s 
is equivalent to x, where x G s or Ci{ti,--- ,tk), where each ti is a parameter 
standing for some term in Tc (expressed using variables in s). 

A parametric term over Tc is thus a parameter, a variable, or Cifti, - ■ ■ , tk), 
where each ti is a parametric term. Using the above axioms of Tc, an equa- 
tion over parametric terms can be checked for unsatisfiability. If an equation is 
satisfiable, then values of parameters making it true can be determined. 

2.3 Splitting a Rule 

In the procedure in Section 3, it becomes necessary to replace a rule in a defi- 
nition by a finite set of equivalent rules. This splitting of a rule is guided by a 
substitution of variables in the rule. For a substitution a assigning a variable x 
the term cr(x), a finite set of terms including a(x) must be computed such that 
their ground instance cover all the ground terms in the sort of x. 

Given a linear T-term s in which variables appear at most once, let cover(s) 
be a finite (preferably, minimal) set of terms in canonical forms including s such 
that every ground term of sort{s) is equivalent in T to 0{t) for some ground 
substitution 6 and t G cover{s). For examples, in the theory of free constructors, 
cover{x) = {x} and cover {a{x,y)) = {co{- ■ -),ci{- ■■)■■■ Ci{x,y), - ■■ ,Cm{- ■ ■)}, 
where co,---,Cm are all the constructors of sort{x). For PA, cover{x) = {x}; 
cover{x + y) = {x + y}, cover(O) = {0, x-|-l}, cover{n) = {0, • • • , n, x-|-n-|-l}, etc. 

Given a substitution a = {xi — >■ si, • • • , x„ — >■ s„}, where sfs are linear pair- 
wise disjoint (in variables) T-terms in canonical form, the set of substitutions 
complement to cr, denoted as compl{a), is {9 = {x\ — >■ ti,---,x„ — >■ tn\ \ 
(ti, • • • , G coxer(si) x cover{s 2 ) x • • • x cover(sn) \ (si, • • • , s„)}"‘. 

Test sets used for checking the sufficient completeness property of a function defini- 
tion can be nsed for this purpose; see, e.g., [7]. 
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Guided by a substitution a, a rule I — >■ r can be split into a finite equivalent 
set of rules {0{l) 9{r) \ 9 G compl{a) U {cr}}, denoted by split{{l — >■ r},a). 

It is useful to normalize the right side of the rules (using the rule set under 
consideration) . 

Consider a substitution ct = {x — >■ 0, j/ — >■ 0} over the theory of naturals 
generated by 0, s(x). Then compl{a) = {{a; — >■ 0,y — >■ s{y')}, {x — >■ s{x'),y — >■ 
0}, {x — >■ s{x'),y — >■ s{y')}} has three substitutions obtained by deleting the 
element {0,0} from the cross product couer(O) x couer(O). A rule such as 
4 : e2plus{s{x) , s{y)) — >■ s{e2plus{s{x),y)) in section 1, can be split into four 
equivalent rules-one for each substitution in {a} U compl{a). The rule 4.1 : 
e2plus{s{0),s{0)) — >■ s(e2p/us(s(0), 0)) is generated using a; 4.2 : e2plus{s{0), 
s(s(y'))) — >■ s(e2plus(s(0), s(y'))) comes from the first substitution in compl{a); 
two more rules are similarly generated from the other two substitutions in 
compl{a). 

2.4 Abstracting Terms 

In the procedure in Section 3, it will be necessary to generate a term from the 
left sides of a finite set of rules in a T-based definition, in order to speculate 
an equational conjecture with that term as one of its sides. To perform this 
abstraction operation, we define from a given finite set L of /c-tuples of variable- 
disjoint T-terms, preferably a smaller set N of /c-tuple of terms such that the 
ground instances of N and L are identical. (In the worst case, this operation will 
return L itself.) 

The abstraction operation ABS on a finite set of tuples is defined recursively 
by first defining abs on a finite set of terms. For the theory of free constructors, 

1. a&s({t|) = {tj for any t, 

2. a&s({a;} U T) = {a;}; 

3. abs{{ti,t 2 } U L) = a6s(|ti} U L) if t 2 is an instance of ti; 

4. abs{{ci{x*) , C 2 {y*) , ■ ■ ■ , q(z*)| U L) = abs{{x} U L) where Cj’s are all the I 
constructors of 7c and x does not appear in L; 

5. a&s(|ci(s*), • • • , Ci{t*)} U L) = abs{{ci{u*) \ u* € ABS({s*, • • • , t*})} U L) 
where L does not have any term with Ci as its outermost symbol. 

6. otherwise, abs returns the input itself. 

The function ABS on ^-tuples is defined as: Ai?S'(|s*}) = |s*}; ABS{{sl,- ■ ■ , 
s^jUT) = ABS{M\JL), where s{, • • • , Sm differ only in their z-th component,! < 
i < k, and L does not include any fc-tuple that is the same as except for 
its z-th component, M = {t*\t* = except for the z-th component and ti G 
a6s({sii,---,SmJ)}- 

For PA, every ground term can be represented as 0 or s*(0), fc > 0, where 
s is a free constructor; every term with unique occurrences of variables can be 
represented as x -I- y -I- • — h u or {x+y+- ■ --|-zz) . The function abs is defined as: 

1. abs{{x + y + ■ ■ ■ + u} U L) = |x}; 

2. abs{{s^{x + y + 1- u)} U L) = |s^(x)} U L; 
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3. abs{{ti,t 2 } U L) = a6s({ti} U L) if t 2 is an instance of ti; 

4. o&s({0, s(cc)} U L) = abs{{x} U L), where x does not appear in L; 

5. a&s({s(ti), • • • , s(tm)} U L) = abs{{s{u) \ u G a6s({fi, • • • ,tm})} U L) where 
L does not have any term with s as its outermost symbol; 

6. otherwise, abs returns the input itself. 

For instance, the three left sides e2p^us(s(0),0), e2plus{s{0) ,s{0)) and e2plus{s{0) , 
s{s{x))) of rules 3.1, 4.1 and 4.2 in our running example can be used to formu- 
late the left side of a new conjecture. To compute ABS{{{s{0),0), (s(0),s(0)), 
(s(0), s(s(x)))}), apply abs on the second component {0, s(0), s(s(cc))}, which 
gives a6s({s(0), s(s(a:))} U {0}) = a6s({s(x'), 0}) = y. The result of ABS is thus 
Ai?S'({(s(0), ?/)}) = (s(0),j/). The abstracted term is e2plus{s{0),y), leading to 
the lemma e2plus{s{Q),y)) — >■ s{s{y))). 

3 Generating Simple Lemmas 

Before discussing the main procedure for generating simple lemma, the four main 
building blocks in the procedure are presented. The procedure can generate 
lemmas of the form /(si, • • • , Sfc), where the subset of rules from Df which 
can be used to normalize every ground instance of /(si,---,Sfc) to a T-term 
have left sides which match /(si, • • • , Sfc); this condition is necessary to generate 
parametric constraints from the rules defining /(si, • • • , Sk) as shown below. Even 
though the presentation below is given using 7* = PA, the procedure works for 
any decidable theory satisfying the properties discussed in the introduction. In 
the next section, the procedure is illustrated for P whose terms can have one of 
many canonical forms; in contrast, here terms of PA can be represented using a 
single canonical form. 

3.1 Generating Right Side of a Conjecture 

In the procedure, conjectures are speculated by constructing terms of the form 
Ihs = /(si,- • •, Sfc), where each Si is a T-term, and determining whether /(si,- • • , 
Sk) is equivalent to some T-term, say rhs such that Ihs = rhs is an inductive 
consequence of Df. For this, every possible candidate for rhs, i.e., every canonical 
form of a T-term, is attempted. If no such rhs exists, then instances of Ihs may 
be candidates to serve as the left side of other conjectures; this will be discussed 
in the next subsection. In subsection 3.3, a method for generating such Ihs's is 
given; furthermore, a method for generating from the definition Df of f, the rule 
set Dihs consisting of rules necessary to normalize every ground instance of Ihs 
to a T-term is also presented. 

For PA, rhs = Bki Xi + ko and a solution to the parameters is attempted. 
The variables Xi’s are all the variables in V{lhs). For each rule h — >■ in Dihs, a 

parametric equation is generated by replacing each term of the form /(ti, • • • , T) 
in the rule by a{EkiXi + kg), where cr(/(si, • • • , Sk)) = /(ti, • • • , T)®. To ensure 

® Since nested recnrsive calls are not allowed in the right sides of rules in a PA-based 
function definition, a constraint so obtained is a parametric equation in PA. 
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that a parametric equations can be generated corresponding to a rule — >■ r*, h 

must match with Ihs and each recursive call t to / in ri must either match with 
Ihs or a{t) must simplify to a T-term. 

Let S be the finite set of all the parametric equations so generated. A para- 
metric equation generated from a rule may simplify using PA to a parametric 
constraint, which is a linear polynomial in parameters. If S includes an unsatisfi- 
able parametric equation or a weak parametric equation, then S is not consistent. 
Values of parameters constrained by different parametric constraints may clash 
as well. If S is consistent, i.e., for all values of variables, parameter values can be 
found making each equation in S valid, then the conjecture Ihs = 5{SkiXi + kg), 
where 6 is the most general solution of the set of parametric constraints gener- 
ated from S, is a lemma. And, this implies that Ihs is expressible in PA. The 
lemma generated is added to Df to simplify other rules and lemmas. 

In case S is inconsistent, parametric constraints in S are used to compute 
maximal consistent subsets from which instances of Ihs that can serve as the left 
sides of new conjectures are generated as discussed in the next subsection. 

3.2 Generating Maximal Consistent Subsets 

In the previous step, if the set S of parametric equations generated form a 
given Ihs = /(si, • • • , Sk) is inconsistent, S is used to generate instances of Ihs 
possibly serving as the left sides of new conjectures about /. Toward this goal, 
every maximal consistent subset of S and the associated rule set are computed 
from S and the associated rule set Dihs- 

First, every unsatisfiable parametric equation is deleted from S. The re- 
maining equations are partitioned into the subsets Sen, Sst, S^k, standing for 
parametric constraints, strongly satisfiable parametric equations (which involve 
both variables and parameters) and weak parametric equations (which also in- 
volve both variables and parameters). 

A strongly satisfiable parametric equation st : SpiXi -I- po = 0 in Sst can ei- 
ther be viewed as a conjunction of parametric constraints c : po = 0, • • • ,pm = 0 
which can be included in Sen, or it can be viewed as a weak parametric equation 
and be included in S^uk- For each st € Sst, there are two cases leading to an ex- 
ponentially many possibilities, each consisting of {PC, WPE), where PC is the 
set of parametric constraints and WPE is the set of weak parametric equations. 
For each maximal consistent subset MCS of PC, a most general solution for pa- 
rameters 6 is computed; let R{MCS) stand for the associated rule set taken from 
Difis- The solution S is then applied on each weak parametric equation in WPE 
to compute the values of variables, if any, such that the instantiated weak para- 
metric equation is consistent with MCS. For any such substitution of variables 
of the weak parametric equation, the corresponding rule is split (see subsection 
2.3) and R{MCS) is extended to include the instance of the rule. The result is 
a rule set RE corresponding to MCS and instances of rules corresponding to 
weak parametric equations in WPE which are consistent with MCS. For each 
such possibility, the rule set RE is then used in the next subsection to speculate 
the left sides of the conjectures which are instances of Ihs. 
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For example, consider the inconsistent set of parametric equations S = {si : 
fcs = 1, s 2 : ^2 = 1, s3 : fci cc + fcs = ki, s4 : ^2 = 1} generated from De 2 pius in 
section 1. Sen = (si ■ k^ = 1; s2 : k 2 = 1; s4 : fci = 1}, Sst = {ki x + k^ = ki}, 
and Syjk = {}• There are two possibilities based on (fci x + k^ = ki}: {{PC : 
(si : ks = 1; s2: k 2 = 1; s3 : ki = 0, ks = 0; s4 : /C 2 = 1}, WPE : {}), {PC : 
{si : fca = 1; s2 : k 2 = 1; s4 : fc 2 = 1}, WPE : {s3 \ k\ x + k^ = fci})}- There 
are two maximal consistent subsets MCS\ = {k\ = 0, /C 3 = 0 ,/c 2 = 1} and 
MCS 2 = {fcs = l,/c 2 = 1}®. Corresponding to MCSi, the rule set is {2,3,4}. 
For MCS 2 , an instance of rule 3 generated from s3 is computed giving x to be 
0 ; the resulting rule set is {1,2, 3.1, 4}. 



3.3 Identifying Left Sides and Their Complete Definitions 

Given a subset R of rules oi Df which possibly generate a consistent set of 
parametric equations, left sides of conjectures need to be speculated. The rule 
set R is first preprocessed to ensure that most left sides of conjectures can be 
generated. After preprocessing, for every speculated left side Ihs = f{si, • • • , Sk) 
of a conjecture, it must be ensured that every ground instance of Ihs can be 
normalized to a ground T-term using the rules in R; if not, additional rules from 
D f needs to be added to R to ensure this property (hopefully without violating 
the consistency condition). 

Preprocessing. Rules in R are refined by splitting based on unifying recursive 
calls with the left sides of nonrecursive rules in R. Let h — >■ r* be a non-recursive 
rule in R (i.e. is a PA-term). For every recursive rule Ij — >■ rj in R, replace 
it by split{{lj — >■ rj}, a) if a recursive call in rj unifies with h with the mgu a. 
The expansion is performed for all possible pairs of non-recursive rules h — >■ 
and recursive rules Ij — >■ rj in R. The output of the preprocessing step is a rule 
set equivalent to the input rule set which may include instances of rules in the 
input rule set. We will abuse the notation and let R stand for the output of the 
preprocessing step. 

Speculating Left Sides. The left sides of the rules in R are abstracted using 
ABS discussed in subsection 2.4 to generate ^-tuples of terms to construct the 
left sides of new conjectures to be speculated. Let LHS be the finite set of 
possible left sides of the form /(si, ■ ■ ■ ,Sk)- 

Generating a Complete Definition for a Left Side. For each new left side 
Ihs = /(si, • • • , Sfc), its complete definition Di^s as a set of rules is computed 
from R using the function complete which is initially invoked with {Ihs}, R, {} 
and a boolean flag set to false. The following invariant holds for each recursive 
call complete{t, S\, S 2 , b); t is always of the form /(s{, • • • , s{), where each s' is 
a P-term; every ground instance of t can be normalized to a P-term using rules 
in Si U S' 2 . 



The reader might have noticed some duplication in computing maximal consistent 
subsets; better algorithms need to be investigated to avoid it. 
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1. complete{{t} U L, i?, H, 6) = comp/e^e(L, i?, H, 6) if i rewrites using a rule 
in D. 

2. complete{{t} U L,R,D) = complete{{9j{t) \ 9j € compl((7)} U L,R,D,b) 

if t does not rewrite using a rule in D but there is a rule h — >■ in _D such 

that a = mgu{t,li). 

3. complete{{t'\ \J L, R, D,b) = complete{{9i{t) \ 9i € compl(cr)} U L, R', 

D', false) where U — >■ is a rule in R such that cr = mgu{t,li), D' = D U 

{a{li) -)> cr(rj)}, and R' = R - {k ^ rj U {9i{li) 9i{ri) | 9i G compl{a)}. 

4. complete{{}, R, D, false) = complete{T, R, D,true), where T = {tj \ tj ap- 
pears as a recursive call to / in the right side of a rule in D}. 

5. complete{{}, R, D,true) = D. 

For example, consider computing the complete definition of Ihs = /(O, s{y)) with 
R being 



1. /(0, 0) ^ 0, 2. f{s(x), 0) ^ s(/(0, a;)), 

3. /(0,s(x)) s{f{x,0)), 4. f{s{x),s{y)) s{s[f{x,y))). 

The initial call is complete{{f{0, s{y))}, {1,2, 3, 4}, {}, false). The left side of 
rule 3 unifies with /(O, s{y)) by the substitution x ^ y for which the set of com- 
plement substitutions is empty; we thus have completed}, {1,2,4}, {3}, false). 
By step 4, we get complete{{f{y,0)}, {1,2,4}, {3}, true) based on the recur- 
sive call in rule 3. Since f{y, 0) unifies with rules 1, we get complete{{f {s{y) , 0)}, 
{2, 4},{1, 3}, false)] again using step 3 with rule 2, we get complete{{}, {4}, {1, 2, 
3}, false). From step 4, we get complete{{f{0,y), f{y,0)}, {4}, {1,2,3}, true). 
Using step 2, this reduces to 

complete{{f {0, s{y)), f{y, 0)}, {4:}, {1,2, 3}, true)] using step 1, this reduces to: 
complete({/(y, 0)}, {4}, {1, 2, 3}, true). Again applying steps 2 and 1, respec- 
tively, we get 

complete({}, {4}, {1, 2, 3}, true), which terminates with {1,2,3} as the rule set. 

3.4 Procedure 

The procedure for generating lemmas for PA given below is invoked with the 
most general Ihs = f{xi, ■ ■ ■ , Xm) and Df, where x's are distinct variables. If a 
right side for this Ihs can be found (implying that constraints generated from 
Dihs are consistent), then Ihs is expressible in PA and the procedure halts. 
Otherwise, rule sets associated with maximal consistent subsets of constraints 
are identified; for each such set, instances of Ihs are speculated; for each such 
instance, say 9{lhs), a complete definition sufficient to normalize every 

ground instance of 9{lhs) to a T-term, is computed, and then the above steps 
are repeated. Lemmas thus generated are added to Df to help speculate further 
conjectures. 

For a given Ihs, to ensure that parametric equations can be generated from 
each rule in Dihs, the left side of each rule in Di^s must match Ihs using a (i.e., 
a{lhs) = 1) and each recursive call t in r must either match Ihs or aft) must 
simplify to a T-term using Dihs. 
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Input: A PA-based, terminating, sufiiciently-complete and ground-confluent definition Df of /. 

Output: {/( 51 , • • • , Sfc) = r} where each Si and r are PA terms such that 
D f l^ind r. 

Method: Initially, Cihs ^ {(/(xi, ■ ■ ■ , Xm), -D/)}- 
While there is an unmarked pair {Ihs, Dihs) € Cihs do 

1. Find right side: Let rhs be a parametric term in PA expressed using variables in Ihs. 

Generate a set of parametric equations S using Dihs and the conjecture Ihs — rhs. 

1.1 Success: If S is consistent then let S be the most general solution of S. Output lemma 
C: Ihs — 6{rhs). Mark {//is, Dihs) in Cihs- Add C to Df and simplify. 

Go to step 3 for generating new candidate left sides based on the simplified Df. 

1.2 Fail: Ihs — rhs is not expressible in PA] mark {Ihs, Dihs) in Cihs- Go to step 2. 

2. Generate maximal consistent subsets of S: Generate from S, the set of pairs 
{{MCS,R{MCS)) I MCS is a maximal consistent subset of S extended after instantiating 
weak parametric equations in 5)}, where R(MCS) is the rule set associated with MCS. 

3. Identify left sides and definitions: From each pair {MCS, R{MCS)), identify new candidates 
for the left side of conjectures using ABS. For each new candidate Ihs, generate a complete 
definition Dihs using the procedure complete. Add the unmarked pair {Ihs, Dihs) to Cihs- 
Repeat the above procedure. 



4 Theories Admitting Multiple Canonical Forms 

The above procedure is an instance of a general procedure which works for 
decidable theories admitting multiple canonical form schemes. In the case of 
PA, a term in PA can be represented using a single parametric form. That is 
not true in general. A recursive data structure such as finite lists, has more than 
one canonical forms. In the theory of finite lists, a term can be a variable or nil 
representing an empty list, or cons{x,l), where x is an element and I is a list. 
Unlike PA, these canonical forms cannot be represented using a single scheme. 
The theory of finite lists is an instance of the generic theory of free constructors 
with m distinct constructors, say Ci, C2, • • • , c^, m > 1. 

Simple lemmas can be generated from recursive function definitions based in 
such theories by considering each canonical form scheme as the possible right 
side for the speculated left side of a conjecture. It is shown in [6] that as in the 
case of PA, no-theory condition for the quantifier-free theory of free constructors 
is decidable (i.e., whether a term /(si, • • • , Sk) is equivalent to a constructor term 
is decidable, where / is X-based and each Si is a constructor term). Furthermore, 
equations relating parametric terms in this theory can be solved for parameters 
as in PA. Below, we illustrate the general procedure with append: 

1. append{nil,y) — >• y, 2. append{cons{xi,x),y) — >■ cons{xi, append{x,y)). 

Since finite lists admit as canonical form schemes, a variable, nil,cons{ti,t2), 
where t\, ^2 are themselves canonical form schemes, four conjectures with append 
(x,y) as their left side are speculated with the right side being each of the 
canonical forms: nil, x, y, and cons{ti,t2)- 

To check whether append{x, y) can be equivalent to some constructor term 
s, nonrecursive rules are first used to generate constraints and perhaps rule out 
most of the candidate canonical forms. From the first rule in the definition, since 
there is a substitution cr such that u{append{x,y)) = append{nil , y) , it must be 
the case that a{s) = y. That is, when nil is substituted for x in s, it should 
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be equivalent to y, since in the theory of free constructors, constructor terms 
cannot be simplified, s = j/. It is easy to see that assuming append{x, y) = y, the 
constraint from the second rule, y = cons{xi, y), is unsatisfiable for any value of 
Xi,y. This implies that there is no const ructor-term s such that append{x, y) = s. 
Speculating the right side to be nil,y, or cons{ti,t 2 ) can be shown not to lead 
to any lemmas being speculated either. 

Conjecturing appendix, y) = x, however, results in the nonrecursive rule gen- 
erating a constraint nil = y which is weakly satisfiable only if y = nil. The second 
rule generates the constraint cons(xi, x) = cons(xi, x), a valid formula. Combin- 
ing the two constraints, a possible conjecture append{x , nil) = a; is speculated, 
which is valid and thus a lemma. 

5 Generating Complex Lemmas 

So far, the focus has been on generating simple lemmas of the form /(si,- • •, Sk) = 
r, where r and each Sj are T-terms. Using results from [11] (see also [6]), lemmas 
whose left sides include multiple occurrences of T-based functions can also be 
generated. With this extension, it is possible to generate many nontrivial lemmas 
used in the verification of arithmetic circuits [9,10,12]. This is illustrated using 
an example in the next subsection. 

To consider conjectures in which many T-based function symbols can occur, 
it is first necessary to ensure that the definitions of function symbols appearing in 
such conjectures are compatible in the sense of [11,6]. Compatibility ensure that 
when a term built from a compatible sequence of function symbols is instantiated 
and normalized, the result is a term in T. Informally, g is compatible with / 
if for every rule of /, the context created by the rule on its right side can be 
simplified by the definition of g. 

Definition 4 (Compatibility [6]). Let g,f be T-based, f ^ Tp, and 1 < 

j < m = arity(g). The definition of g is compatible with the definition of / 
on argument j iff for all rules a : f{s*,y*) -P- C[/(t*, y*), . . . , /(t* , y*)], either 
n = 0 and a € ExCgj, or g{xi, . . . , Xj-i, C[zi, . . . , Zn],Xj+i, . . Xm) 

D[g{xi,. . ,,Xj-i, zp, Xj+i,...,Xm),--; g(xi,. . ,,Xj-i, Zi,^, xj+i,. . .,Xm)] for a 
context D over Tp, i \, ..., ifc G {1, ..., n}, Zi ^ V{D) for all i. 

ExCgj stands for the non-recursive rules in the definition of / whose right sides 
cannot be simplified by the definition of g [6]. Relaxing the requirement that g 
does not have to be compatible with the nonrecursive rules in the definition of 
/ allows more function definitions to be compatible. 

For example, mod2 and half compute, respectively, the remainder and quo- 
tient on division by 2. These definitions are based in the theory of free construc- 
tors 0, s (and also PA). 

1. mod2(0) — >■ 0, 2. mod2(l) 1, 3. mod2[s{s{x))) — >■ mod2[x). 

4. halffiS) — >• 0, 5. halffi) — >• 0, 6. half{s{s{x))) — >■ s{half(x)). 

It is easy to check that half is compatible with mod2. However, mod2 is not 
compatible with half since mod2{hal f {s{s{x)))) generated from rule 6 reduces to 
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mod2{s{hal f {x))) using the rule and cannot be simplified any further. However, 
mod2 is compatible with mod2, but half is not compatible with half. 

The compatibility property can be easily checked from the definitions of 
function symbols, and compatible sequence of function symbols can be easily 
generated^. 

Using the procedure discussed in Section 3, it is easy to see that neither 
mod2{x) nor half{x) are expressible in PA. Furthermore, the procedure will 
not generate any simple lemma about mod2 and half . 

Given PA-based function symbols g and / such that g is compatible with 
/, a conjecture g{yi, - ■ ■ , f{xi, - ■ ■ ,x„), - ■ ■ ,yk) = Phxi + Skjyj + ko can be 
speculated where a;'s and y's are distinct variables and / appears only in the 
inductive argument positions of g (on which the definition of g recurses). The 
rest is the same as in Section 3. We illustrate the procedure using the example 
hal f {mod2{x)) . 

Given hal f {mod2{x)) = k\ x + k 2 , where parameters ki,k 2 are unknown, 
parametric equations are generated using the rules defining mod2. From rule 
1, X is instantiated to 0, giving half{mod2{0)) = fci 0 + ^2, which simplifies 
using rules 1 and 4 to the constraint k 2 = 0. From rule 2, x is instantiated to 
1, giving hal f {mod2{l)) = k\ 1 + ^ 2 , which simplifies using rules 2 and 5 to 
the constraint fci + ^2 = 0- From rule 3, x is instantiated to s(s(m)) giving the 
conclusion subgoal hal f {mod2{s{s{u)))) = k\ u + k\ + k\ + k 2 assuming the 
induction hypothesis hal f {mod2{u)) = k\ u + /c 2 , obtained by instantiating x to 
be u. After simplifying using rules 3 and 6, and using the induction hypothesis, 
we get k\ u + k 2 = k\ u + k\ + k\ + k 2 , which simplifies to k\ + k\ = 0. These 
three constraints are consistent, giving k\ = k 2 = Q the solution. And, lemma 
half{mod2{x)) = 0 is generated. 

The notion of compatibility can be extended to that of a compatible sequence 
involving function symbols [/i, • • • , fm] where each fi is compatible with /i+i, 
1 < z < TO — 1. Given a sequence [g, fi, / 2 ], for example, we can generate the left 
side of conjectures of the form g(z/i , • • • , /i (xi , • • • , /2 (zi , • • • , Zm) , • • • , • • • , i/fc) 

such that the variables a;'s, z/'s and z(,s are distinct and the variables z(,s the in- 
ductive positions of / 2 ® do not appear in x's and y's. The left side of a conjecture 
thus generated can be checked for the no-theory condition. 

If a conjecture generated from a compatible sequence of function definitions 
cannot be expressed in T, then appropriate instances of the conjecture are found 
similar to Section 3. Some of the lemmas generated by the proposed approach 
based on the theories of PA and finite lists are given in Figure 1. Automatic 
generation of one such lemma is illustrated in the next subsection, where some 
of the function appearing in Figure 1 are also defined. 



^ Note that besides the rules in the definition of / and g, other applicable rules may 
be used to simplify terms to check compatibility of functions / and g. Properties of 
function symbols such as associative-commutative(TG) may also be used. 

® Positions of /2 involved in recursion in the definition of /2. 
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Lemma 


Theory 


Remarks 


bton{radder(ntob(x),ntob(x),ntob{x))) = x x x 

bton{cadder(ntob(x),ntob(x),ntob(x))) = x + x x 

bton(leftshift(ntob(x))) = x + x 

bton(pad0{ntob{x))) = x 

compl(compl{x)) = x 

half{mod2{x)) — 0 

mod2(mod2(x)) — 0 

ack{l, n) — n -\-2 

ack(2, n) — 2n - 3 

rev{rev(x)) = x 

append{x, nil) = x 


Finite lists, PA 
Finite lists, PA 
Finite lists 
Finite lists, PA 
Finite lists 
PA 
PA 
PA 
PA 

Finite lists 
Finite lists 


Correctness of ripple carry 
Correctness of carry save 
Correctness of left-shift 
Correctness of Padding 
Complement’s idempotence 
Multiple functions 
Multiple functions 
Nested recursive calls 
Nested recursive calls 
Multiple canonical forms 
Multiple canonical forms 



Fig. 1. Some Examples of Generated Lemmas 



5.1 Generating Lemmas for Arithmetic Circuits 

Consider the following definitions of bton, ntob and padO. The functions bton and 
ntob convert binary to decimal representations and vice versa, respectively. The 
function padO adds a leading binary zero to a bit list. Bits increase in significance 
in the list with the first element of the list being the least significant. These 
functions are used to reason about number-theoretic properties of parameterized 
arithmetic circuits [8,10]. 

1. bton{nil) — > 0, 2. bton{cons{bO,yi)) — > bton{yi) + bton(yi), 

3. bton{cons{bl,yi)) — > s(bton(yi) + bton(yi)). 

4. ntob{0) — )■ cons {bO, nil), 5. nto6(s(0)) — > cons{bl,nil), 

6. ntob(s{s{x 2 + 2 : 2 ))) — > cons{b 0 ,ntob{s(x 2 ))), 

7. ntob{s{s{s{(x 2 + X 2 ))))) — > cons(bl, ntob{s{x 2 ))) 

8. pad0{nil) — )■ cons(b0,nil), 9. pad0{cons{b0,y)) — > cons{b0,pad0(y)), 

10. padO{cons{bl,y)) — > cons{bl,padO{y)). 

The underlying decidable theory T is the combination of the quantifier-free the- 
ories of finite lists of bits and PA; 60, 61 stand for binary 0 and 1. Definitions of 
bton, ntob, and padO are T-based. The function padO is compatible with ntob; bton 
is compatible with padO. Hence [bton, padO, ntob] form a compatible sequence. 

Padding of output bit vectors of one stage with leading zeros before using 
them as input to the next stage is common in multiplier circuits realized using a 
tree of carry-save adders. An important lemma needed to establish the correct- 
ness of such circuits is that the padding does not affect the number output by 
the circuit. We illustrate how this lemma can be automatically generated. 

A conjecture with bton{padO{ntob{x))) as the left side is attempted. Since the 
sort of bton is natural numbers, the right side of the conjecture is ki x + ko. As 
before, parametric equations are generated corresponding to the instantiations 
of X for each of the four rules defining the innermost function ntob. 

— From rule 4, x gets instantiated to 0, giving bton{pad0{ntob{0))) = fci 0 -I- 
k 2 . This simplifies using the rules 1, 9, 8, 2, and 1 and PA to the constraint 
si: 0 = ^ 2 . 
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— From rule 5, x gets instantiated to 1, giving bton{padQ{ntob{l))) = k\ 1 + 
k 2 - This simplifies using the rules 5, 10, 8, 2, and 1 and PA to the constraint 
s2: 1 = fci + ^ 2 - 

— From rule 6, x is instantiated as s(s(x 2 + X 2 )) for the conclusion sub- 
goal with X instantiated as s(x 2 ) for the induction hypothesis. This gives 
bton(pad0(ntob(s(s(x2+X2))))) = fci s(s(a: 2 -l-a; 2 )) -I- ^2 as the conclusion with 
bton{pad0{ntob{s{x2)))) = kis{x 2 ) + ^2 as the hypothesis. The conclusion 
simplifies using rules 6, 9, 2 to bton{padQ{ntob{s{x 2 )))) + bton{padQ{ntob{ 
s(a; 2 )))) = k\ s{s{x 2 + X 2 )) + k 2 - Applying the hypothesis gives k\ s{x 2 ) + 
k 2 + k\ s{x 2 ) + k 2 = ki s{s{x 2 + X 2 )) + k 2 , which gives the constraint s3: 
k2 = 0 . 

— Parametric equation from rule 7 is similarly generated and gives constraint 
s4:/c2 = 0. 

Taking the conjunction of all the constraints gives {fci = 1,^2 = 0}, implying 
that it is a consistent set. This gives the lemma 

bton{padQ{ntob{x))) = x, 

which states that padO does not change the numeric representation of a bit list 
as required in the verification of properties of multiplier circuits, in [8,12]. 



6 Conclusion and Further Research 

A procedure for automatically generating lemmas from recursive function def- 
inition is given using decision procedures. It is expected that after a user has 
written recursive definitions of functions on theories such as PA and on con- 
structors, the procedure will automatically start generating lemmas whose one 
side has terms from the underlying decidable theory. In this way, a library of 
useful lemmas can be automatically built with the hope that these lemmas will 
be found useful in the proofs of other nontrivial lemmas. This is demonstrated 
using examples about commonly used function definitions on numbers and finite 
lists, as well as from our experiments in automatically verifying number-theoretic 
properties of arithmetic hardware circuits. 

Conjectures considered in this paper have one side to be a term from a 
decidable theory. In [5], the approach proposed in [II] has been extended to 
decide quantifier-free formulas which are boolean combinations of such equations. 
Recently in [6], a decision procedure for a class of equations in which function 
symbols with P -based definitions can appear on both sides of equations has been 
developed. The proposed procedure needs to be extended so that a wider class 
of lemmas can be automatically generated from T-based function definitions. 

We have discussed two well-known quantifier-free decidable theories on 
numbers-the theory of free constructors 0, s, and the quantifier-free theory of 
Presburger arithmetic. Most of the examples in the paper are done in the frame- 
work of the quantifier-free theory of Presburger arithmetic and the theory of 
finite lists. There exists even a more expressive and decidable theory of numbers 
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involving 0, 1, s, +, 2^, >; we will call it the theory of 2exp. These three theories 
define a strict hierarchy in their expressive power about properties of numbers. 
Given a function definition on numbers based in a theory T, it is possible to spec- 
ulate conjectures using the most expressive theory (e.g., the theory of 2exp). If a 
given conjecture cannot be expressed in the most expressive theory in the hierar- 
chy, then it cannot be expressed in any theory in the hierarchy. However, the most 
expressive the theory, solving parametric equations in it can be more complex. 
There is thus a trade-off. An alternate method is to start with the least expres- 
sive power (e.g., T, the theory in which the definition is based), and then move 
to more expressive theories for conjectures which cannot be expressed in T. The 
proposed approach will work with either of the two heuristics for trying theories 
in different orders. In fact, some of the examples in the paper, e.g. e2plus could 
have been attempted using the theory of exponentiation. The function e2plus, 
for instance, can be shown to be expressible in this theory. Similarly, instances of 
Ackermann’s function including acfc(0, y), ack{l, y),ack{2, y), ack{3, y) can all be 
expressed in the theory of exponentiation. The expanded version, available from 
http : //faculty . 1st .unomaha. edu/msubramaniam/ subuweb.htm, includes an 
Appendix in which derivation of lemmas about Ackermann’s function is dis- 
cussed. Ackermann’s function is not even PA-based, because of nested recursive 
calls in it; however, the proposed approach still works on it provided parametric 
equations in PA are extended to include nonlinear polynomials over parameters 
as coefficients of variables as well as limited reasoning about exponentiation and 
multiplication are available to simplify parametric constraints. 
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Abstract. Several induction theorem provers were developed to verify 
functional programs mechanically. Unfortunately, automated verification 
usually fails for functions with accumulating arguments. In particular, 
this holds for tail-recursive functions that correspond to imperative pro- 
grams, but also for programs with nested recursion. 

Based on results from the theory of tree transducers, we develop an au- 
tomatic transformation technique. It transforms accumulative functional 
programs into non-accumulative ones, which are much better suited for 
automated verification by induction theorem provers. Hence, in contrast 
to classical program transformations aiming at improving the efficiency, 
the goal of our deaccumulation technique is to improve the provability. 

1 Introduction 

In safety-critical applications, a formal verification of programs is required. 
However, since mathematical correctness proofs are very expensive and time- 
consuming, one tries to automate this task as much as possible. Since induction 
is an important proof technique required for program verification, several induc- 
tion theorem provers have been developed, which can be used for mechanized 
reasoning about program properties (e.g., NQTHM [4], ACL-2 [17], RRL [16], 
CLAM [5], INKA [1,26], and SPIKE [3]). However, while such provers are suc- 
cessfully applied for functional programs, they often have severe problems in 
dealing with imperative programs. 

As running example, we consider the calculation of a decreasing list contain- 
ing the first xi even numbers (i.e., [2xi — 2, . . . , 4, 2, 0]). This problem can be 
solved by the following part Peven of an imperative program (in C-like syntax): 

[int] even (int xl) 

{ int yl = 0; [int] y2 = [] ; 

while (xl!=0) { y2 = yl:y2; yl = yl+2; xl — ; } 

return y2 ; } 

Here, [int] denotes the type of integer lists, [] denotes the empty list, and : 
denotes list insertion, i.e., yi : j /2 inserts the element yi in front of the list y 2 - 

* Research of this author supported by the DFG under grants KU 1290/2-1 and 2-4. 
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Classical techniques for verifying imperative programs are based on inventing 
suitable loop invariants [13]. However, while there are heuristics for finding loop 
invariants [15,23], in general this task is hard to mechanize [7]. 

Instead, our aim is to use the existing powerful induction theorem provers 
also for the verification of imperative programs. To this end, imperative pro- 
grams are translated into the functional input language of induction provers. In 
the absence of pointers, such an automatic translation is easily possible [20] by 
transforming every while-loop into a separate function whose parameters record 
the changes during a run through the while-loop. For our program Peven we ob- 
tain the following tail-recursive program pacc (in Haskell-like syntax) together 
with an initial call race = (/ a;i 0 []). It uses pattern matching on xi (called 
recursion argument) and represents natural numbers with the constructors 0 and 
S for the successor function: 

Pace ■■ f {S xi) yiy 2 = f xi {S {S yi)) {yi : j/ 2 ) 

/ 0 yiV2 = V2 

The above translation of imperative into functional programs always yields tail- 
recursive functions that compute their result using accumulators. Indeed, / accu- 
mulates values in its context arguments (arguments different from the recursion 
argument, i.e., /’s second and third argument). A function is called accumula- 
tive if its context arguments are modified in its recursive calls. For instance, / 
is accumulative, because both the second and the third argument do not remain 
unchanged in the recursive call. A program like Pacc is called accumulative if it 
contains an accumulative function. 

Assume that our aim is to verify the equivalence of race and = {q xi) for 
all natural numbers x\, where Pq is the following functional specification of our 
problem. Here, {q Xi) calculates the desired list and {q' a;i) computes 2 • xi. 

q {S a:i) = {q' a;i) : {q xi) q' {S a;i) = S {S {q' a;i)) 

qQ = [] (?' 0 =0 

Note that even if there exists a “natural” non-accumulative recursive specifica- 
tion of a problem, imperative programs are typically written using loops, which 
translate into accumulative programs. The accumulative version may also be 
more efficient than a non-accumulative implementation (see e.g., App. B). 

But unfortunately, accumulative programs are not suitable for mechanized 
verification. For example, an automatic proof of 

(/ xi 0 []) = {q a;i) 

by induction (using this equation for fixed x± as induction hypothesis) fails, 
because in the induction step (xi 1 — {S xi)) the induction hypothesis cannot 
be successfully applied to prove (/ {S x\) 0 []) = {q {S xi)). For instance, 
for this conjecture the ACL-2 prover performs a series of generalizations that 
do not increase verifiability, and it ends up with consuming all memory avail- 
able. The reason for the verification problems is that / uses accumulators: the 
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context arguments of the term (/ xi (S (S 0)) (0 : [])), which originates from 
rule application to (/ (S xi) 0 []), do not fit to the context arguments of the 
term (/ xi 0 []) in the induction hypothesis! So the problem is that accumu- 
lating parameters are typically initialized with some fixed values (like 0 and []), 
which then appear also in the conjecture to be proved and hence in the induc- 
tion hypothesis. But since accumulators are changed in recursive calls, after rule 
application we have different values like (S (S 0)) and (0 : []) in the induction 
conclusion of the step case. 

In induction theorem proving, this problem is usually solved by transform- 
ing the conjecture to be proved. In other words, the aim is to invent a suit- 
able generalization (see, e.g., [4,14,15,26]). So, instead of the original conjecture 
(/ x\ 0 []) = {q x\), one tries to find a stronger conjecture that however is easier 
to prove. In our example, the original conjecture may be generalized to 

(/ xi yi y 2 ) = {q xi yi) +1- j/2 , 

where -H- denotes list concatenation and where q and q' are defined as follows: 
q {S xi) yi = {q' xi yi) : {q xi yi) q' {S xi) yi = S {S {q' Xi yi)) 

9 0 yi = [] q' 0 yi = yi 

However, finding such generalizations automatically is again very hard. In fact, it 
is as difficult as discovering loop invariants for the original imperative program. 
Therefore, developing techniques to verify accumulative functions is one of the 
most important research topics in the area of inductive theorem proving [14]. 

In contrast to the classical approach of generalizing conjectures, we suggest 
an automated program transformation, which transforms functions that are hard 
to verify into functions that are much more suitable for mechanized verification. 
The advantage of this approach is that it works fully automatically and that by 
transforming a function definition, the verification problems with this function 
are solved once and for all (i.e., for all conjectures one would like to prove about 
this function). In contrast, when using the generalization approach, one has to 
find a new generalization for every new conjecture to be proved. In particu- 
lar, finding generalizations automatically is difficult for conjectures with several 
occurrences of an accumulative function (see e.g., [12] and App. A and B). 

The semantics-preserving transformation to be presented in this paper trans- 
forms the original program pacc into the following program Pnon- 

Pnon ■ f {s Xi) = sub (/' Xi) {S {S 0)) (0 : []) 

/'O = [] 

sub (a:i : X 2 ) yi 2/2 = {sub xi yi j/2) : {sub X 2 yi 2/2) sub 0 2/1 2/2 = 2/i 

sub {S xi) 2/12/2 = 5' {sub xi 2/1 1/2) sub [] 2/1 2/2 = 2/2 

together with an initial call r„o„ = {/' x\). Since Pnon contains a function /' 
without context arguments, and a function sub with unchanged context argu- 
ments in recursive calls, Pnon is a non- accumulative program and our transfor- 
mation technique is called deaccumulation. An application of the substitution 
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function} sub of the form (sub t si S2) replaces all occurrences of 0 in the term 
t by the term si and all occurrences of [] by S2- For instance, the decreasing list 
of the first three even numbers is computed by Pnon as follows: 

/' 0 ) n 0 ) (0 ^ [])) 0 ) (0 : [])) ( 5 ^ 0 ) (0 : []) 

sub {sub (0:[]) (^2 0) (0:[])) (^2 q) (q ^ [j) 

^l_sub {{S^ 0):(0:[])) (5^ q) (Q : []) 

(S^0):{{S^ 0):(0:[D) 

This computation shows that the constructors 0 and [ ] in Pnon are used as “place- 
holders”, which are repeatedly substituted by {S^ 0) and (0 : []), respectively. 

Now, the statement (/' x\) = {q x\) (taken as induction hypothesis I HI) 
can be proved automatically by three nested inductions as follows. During the 
proof, the new subgoals 

IH2 : {sub {q Xi) {S"^ 0) (0 : [])) = {{q' Xi) : {q cci)) and 

IHi : {sub \q' xi) (S'^ 0) (0 : [])) = (S'^ {q' xi)) 

are generated. Note that there is no need to invent these subgoals manually here, 
as these proof obligations show up automatically during the course of the proof. 
We only give the induction steps {xi !->■ {S cci)) of the first two inductions and 
omit the base cases {xi = 0). A similar proof can also be generated by existing 
induction theorem provers like ACL- 2. 

f {Sx,) 

= sub{fx,) {S^O) (0: []) 

= SM& (gxi) (S'2 0) (0 : []) {IHl) 

= {q' xi) : {q Xi) {IH2) 

= q{S xi) 

sub {q {S a;i)) (S'^ 0) (0 : []) 

= sub ((V xi) : {q a;i)) (S'^ 0) (0 : []) 

= {sub {q' xi) (52 0) (0 :[])): {sub {q x^) {S^ 0) (0 : [])) 

= {sub {q' xi) (52 0) (0 :[])): {{q' x^) : {q Xi)) {IH2) 

= {S^ {q' xx)) : {{q' Xi) : {q a:i)) {IHi) 

= {q' {S a:i)) : {q {S xi)) 

In this paper we consider the definition of / in pacc as a macro tree transducer 
(for short mtf) [8,9,11] with one function: in general, such an / is defined by case 
analysis on the root symbol of the recursion argument t. The right-hand side of 
an equation for / may only contain {extended) primitive-recursive function calls, 
i.e., the recursion argument of / has to be a variable that refers to a subtree of t. 
The functions /' and sub together are viewed as a 2-modular tree transducer (for 
short modtf) [10], where it is allowed that a function in module 1 (here /') calls 
a function in module 2 (here sub) non-primitive-recursively. 



^ For simplicity, we regard an untyped language. When introducing types, one would 
generate several substitution functions for the different types of arguments. 
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We slightly modify a decomposition technique from [19] that is based on 
results in [8,9,10] and transforms mtts like / into modtts like /' and sub without 
accumulators. Unfortunately, it turns out that the new programs are still not 
suitable for automatic verification. Since their verification problems are caused 
only by the form of the new initial calls, we suggest another transformation step, 
called constructor replacement, which yields initial calls of the innocuous form 
(/' xi) without initial values like 0 and []. 

Since the class of mtts contains not only tail-recursive programs, but also 
programs with nested recursion, we will demonstrate by examples that our trans- 
formation can not only be useful for functions resulting from the translation of 
imperative programs, but for accumulative functional programs in general! 

Besides this introduction, the paper contains four further sections and two 
appendices. In Sect. 2 we fix the required notions and notations and introduce 
our functional language and tree transducers. Sect. 3 presents the deaccumula- 
tion technique. Sect. 4 compares our technique to related work. Finally, Sect. 5 
contains future research topics. Two additional examples demonstrating the ap- 
plication of our approach can be found in the appendices. 

2 Preliminaries and Language 

For every natural number m G N, [m] denotes the set {!,... ,m}. We use the 
sets X = {xi,X 2 , X 3 , . . .} and Y = {yi, j /27 2 / 3 , . . .} of variables. For every n G N, 
let Xn = {xi, . . . , Xn} and = { 2 / 1 , . . . , 2/n}- In particular, Xq = Yq = 0. 

A ranked alphabet {C, rank) consists of a finite set C and a mapping rank : 
C — >■ N where rank{c) is the arity of c. We define = {c G C | rank{c) = n}. 
The set of trees (or ground terms) over C, denoted by Tc, is the smallest subset 
T C (C U {(} U {)})* with C T and for every c G with n G N — {0} 
and t\, . . . ,tn G T: {c t\ . . . G T. For a term t, pairwise distinct variables 
Xi, . . . , Xn, and terms ti, . . . ,t„, we denote by t[xi/ti, . . . , a:„/t„] the term that 
is obtained from t by substituting every occurrence of Xj in t by tj. We abbreviate 
[xi/t\, . . . ,Xn/tn] by [xj/tj], if the involved variables and terms are clear. 

We consider a simple first-order, constructor-based functional programming 
language P as source and target language for the transformations. Every program 
p G P consists of several modules. In every module a function is defined by 
complete case analysis on the first argument {recursion argument) via pattern 
matching, where only flat patterns of the form {c xi . . . Xk) for constructors c and 
variables Xi are allowed. The other arguments are called context arguments. If, in 
a right-hand side of a function definition, there is a call of the same function, then 
the first argument of this function call has to be a subtree Xi of the first argument 
in the corresponding left-hand side. To ease readability, we choose an untyped 
ranked alphabet Cp of constructors, which is used to build up input and output 
trees of every function in p. In example programs and program transformations 
we relax the completeness of function definitions on Tq^ by leaving out those 
equations which are not intended to be used in evaluations. 

Definition 1 Let C and F be ranked alphabets of constructors and defined 
function symbols, respectively, such that = 0, and X, Y, C, F are pairwise 
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disjoint. We define the sets P, M, R of programs, modules, and right-hand sides 
as follows. Here, p, m, r, c, f range over the sets P, M, R, C, F, respectively. 

p ::= mi... mi (program) 

TO ::= / (ci xi . . .Xfci) j/i . . .y„ = ri (module) 

/ {Cq Xl . ..XkJ y\...yn=rq 

r ::= Xi I t/j I c ri . . . Tfe I / ro ri . . . r„ (right-hand side) 

The sets of constructors, defined functions, and modules that occur in p G P are 
denoted by Cp, Fp, and Mp, respectively. For every f G Fp, there is exactly one 
TO G Mp such that / is defined in to. Then, / is also denoted by /„. For every 
/ G and c G Cp^\ there is exactly one equation of the form 

f (cxi...xk) yi...yn = rhspj^c 

with rhspj^c G RHS{f, CpUFp — {/}, X^, Y„), where for every f G F, C' C CUF, 
and /c,n G N, RHS{f,C',Xk,Yn) is the smallest set RHS satisfying: 

— For every i G [k] and ri, . . . , G RHS: {f Xi ri . . . r„) G RHS. 

— For every c G and ri, . . . , G RHS: (c ri . . . r<j) G RHS. 

— For every j G [n]: yj G RHS. □ 

Note that, in addition to constructors, defined function symbols may also be 
contained in the second argument C' of RHS in the previous definition. The 
functions in C' may then be called with arbitrary arguments in right-hand sides, 
whereas in recursive calls of /, the recursion argument must be an Xi. 

Example 2 Consider the programs Pacc and p„on from the introduction: 

~ Pace G Pj where Mp^^^ contains one module macc,f with the definition of /. 

— Pnon G P, Mp^^^ contains modules mnon,f ,mnon,sub defining /' and sub. □ 

Now, we introduce the classes of tree transducers relevant for this paper. Since 
in our language every module defines exactly one function, to simplify the pre- 
sentation we also project this restriction on tree transducers. In the literature, 
more general classes of macro tree transducers [8,9] and modular tree transduc- 
ers [10] are studied, which allow mutual recursion. Our transformation could 
also handle these classes. In contrast to the literature, we include an initial call 
r in the definition of tree transducers, which has the form of a right-hand side. 

Definition 3 Let p G P. 

— A pair (to, r) with to G Mp and r G RHS{fm, Cp, Xi, Iq) is called a one-state 
macro tree transducer of p (for short 1-mtt of p), if for every c G Cp^^ we 
have rhspj^^c G RHS{fm,Cp, Xk,Yn), where fm G 

Thus, the function fm from module to may call itself in a primitive-recursive 
way, but it does not call any functions from other modules. Moreover, the 
initial call r is a term built from fm, constructors, and the variable x\ as 
first argument of all subterms rooted with fm- 
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— A triple (mi,m 2 ,r) with mi, m 2 € Mp is called homomorphism-substitution 
modular tree transducer of p (for short hsmodtt of p), if there are n G N and 
pairwise distinct substitution constructors tti, . . . ,tt„ G Cp^\ such that: 

1- /mi G and = sub G 

2. for every cG Cp^'^ we have rhspj^^^c G RHS{fmi,Cp\J {sub}, X}^,Yq), 

3. m 2 contains the equations 

sub TTj yi ■ ■ ■Vn = Uj , for every j G [n] 

sub {cxi...Xk) yi...yn = c {sub xi yi . . .y^) ■ ■ ■ {sub xtyi--- yn ) , 

for every c G {Cp - [tti, ... , 7r„})('") 

4. r G RHS{fmi,{Cp - {tti, . . . ,7r„}) U {sm&}, Ai, do)- 

Thus, the function from the module mi is unary. In its right-hand sides, it 
may call itself primitive-recursively and it may call the function sub from the 
module m 2 with arbitrary arguments. The function sub has the special form 
of a substitution function, where {sub t si . . . Sn) replaces all occurrences 
of the substitution constructors tti, . . . , 7t„ in t by Si, . . . , s„, respectively. 
The initial call r is as for 1-mtts, but it may also contain sub, whereas the 
substitution constructors tti, . . . , 7t„ may not appear in it. 

— A 1-mtt {m,r) of p is called nullary constructor disjoint (for short ncd), if 
there are pairwise different nullary constructors ci , . . . , c„ G , such that 
^ = {fm xi Cl . . . c„) and ci, . . . , c„ do not occur in right-hand sides of m. 
An hsmodtt (mi, m 2 ,?’) of p is called ncd, if r = {sub {fmi xi) ci . . . c„) 
with pairwise different ci, . . . ,c„ G — {tti, . . . ,7r„} that do not occur in 
right-hand sides of mi. 

— An hsmodtt (mi,m 2 ,r) of p is initial value free {ivf), if r = {fmi a^i)- Gl 
Example 4 (Ex. 2 continued) 

“ {ixiaccj, Xacc) with initial call r^cc = (/ 0 []) is a 1-mtt of Pacc that is ncd. 

— Our transformation consists of the two steps “decomposition” and “construc- 
tor replacement” . Decomposition transforms Pacc into the following program 
Pdec G P, which contains the modules mdecj' and mdec.sub- 

f {S a:i) = sub (/' a:i) (S' (S tti)) (tti : 712 ) 

/' 0 = 7T2 



sub {xi : X 2 ) yi y 2 = {sub xi yi j/2) : {sub X 2 yi 2/2) sub [] yi y 2 = [] 

sub {S xi) yiy 2 = S {sub xi yi 2/2) sub tti 2/1 2/2 = 2 /i 

sub 0 2/1 1/2 = 0 sub 7 T 2 2/1 2/2 = 2/2 

Here, {mdecj’ ,mdec, sub, r dec) with the initial call Vdec = {sub (/' a;i) 0 []) is 

an hsmodtt of pdec that is ncd, but not ivf. 

- {rrinonj' ,mnon.sub,rnon) with Vnon = (/' Xi) and the modules from the in- 
troduction is an hsmodtt of Pnon that is ivf (n = 2, 7 Ti=0, 7 T2=[]). □ 

For every program p G P, its evaluation is described by a (nondeterministic) 
reduction relation on Tcj,uFp- As usual, and =J>* denote the n-fold com- 
position and the transitive, reflexive closure of =^>p, respectively. If t =>* t' and 
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there is no t" such that t' t" , then t' is called a normal form of t, which is 
denoted by nfp(t), if it exists and is unique. It can be proved in analogy to [10] 
that for every program p G P, hsmodtt {mi, m2, r) of p (and 1-mtt (m, r) of p), 
and t G T{fmiJm 2 }^Cp (and t G ?{/m}uCp) respectively), there exists a unique 
normal form nfp{t). In particular, for every t G Tc^ the normal form nfp{r[xi/t]) 
exists. The proof is based on the result that for every modtt and mtt the cor- 
responding reduction relation is terminating and confluent. The normal form 
nfp{r[xi/i\) is called the output tree computed for the input tree t. 

3 Deaccumulation 

To improve verifiability we transform accumulative programs into non-accumula- 
tive programs by translating 1-mtts into hsmodtts. The defined functions of 
the resulting programs have no context arguments at all or they have context 
arguments that are not accumulating. Moreover, the resulting initial calls have no 
initial values in context argument positions. The transformation proceeds in two 
steps: “decomposition” (Sect. 3.1) and “constructor replacement” (Sect. 3.2). 

3.1 Decomposition 

In [8,9,10] it was shown that every mtt (with possibly several functions of ar- 
bitrary arity) can be decomposed into a top-down tree transducer (an mtt with 
unary functions only) plus a substitution device. In this paper, we use a modifi- 
cation of this result, integrating the constructions of Lemmata 21 and 23 of [19]. 
The key idea is to simulate an (n-l- l)-ary function / by a new unary function /'. 
To this end, all context arguments are deleted and only the recursion argument is 
maintained. Since f does not know the current values of its context arguments, 
it uses a new constructor tt^, whenever / uses its j-th context argument. For 
this purpose, every occurrence of pj in the right-hand sides of equations for / 
is replaced by Tij. The current context arguments themselves are integrated into 
the calculation by replacing every occurrence of the form (/ Xi . . .) in a right- 
hand side or in the initial call by {sub {f xf) . . .). Here, the new function sub is 
a substitution function. As explained before, {sub t s\ . . . s„) replaces every iTj 
in the first argument t of sub by the j-th context argument Sj. 

Lemma 5 For every p G P and 1-mtt (to, r) of p, there are p' G P and 
an hsmodtt {mi,m2,r') of p' such that for every t G Tc^'- nfp{r[xi/f\) = 
nfpi{r'[xi/f\). Additionally, if {m,r) is ncd, then {mi,m2,r') is ncd, too. 

Proof. We construct p' G Phy adding modules toi and m2 to p, and we construct 
r' from r. Let n G N, f = fm G f G {F — Fp)^^\ sub G {F — Fp)(”+^) 

with sub ^ f , and pairwise distinct tti, . . . , 7r„ G {C — Cp)^°\ 

1. For every c G Cp^'^ and for every equation f {c Xi . . . X]f) . . . t/„ = rhspj^c 
in TO, the module toi contains /' {c xi . . .Xk) = dec {rhsp j f), where dec : 
RHS{f, Cp, Xk, r„) ^ RHS{f, Cp U {srt&} U {tti, . . . , 7r„}’, Tq) with: 
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dec ( f Xi r\ . . . Tn) = sub ( f xA c^(ri) . . . dec (rn) , 

for alH G [fc], n, . . . , r„ G RHS{f, Cp, X^, F„) 
dec ( c' ri . . . Xa) = c' (^(ri) . . . dec (r„) , 

for all c' G ri , . . . ,Xa & RHS{f, Cp, Xk, Y„) 
dec (vj) = TTj , for all j G [n] 

For every j G [n] , mi contains a dummy-equation /' TTj = ttj . 

2. TO 2 contains the equations 

sub (c xi...Xk) yi---yn = c {sub xi yi...yn)--.{sub Xk yi---yn) , for all c G 
sub TTj yi--yn = yj , for all j G [n] 

3. r' = dec (r). 

Note that (mi,m 2 ,r') is an hsmodtt of p'. Moreover, for every t G Tcp, we 
have nfp{r[xi/t]) = nfp>{r'[xi/t]). For the proof of this statement, the following 
statements (*) and (**) are proved by simultaneous induction (cf., e.g., [9,11,25]). 
For space reasons we omit this proof. 

(*) For every t G Tc^ and si,. . . ,s„ G TcpU{ 77 i...., 7 r„}: 
nfp{f t si...Sn) = nfpAsub (/' t) si . . . s„). 

(**) For every fc G N, ti, . . . , tfc G Tc^, f G RHS{f,Cp,Xk,Yn), and si, . . . ,s„ G 
nfp{r[xj/tj][yj/sj]) = nfp,{sub {dec{f)[xj/tj]) si . . .s„). 

Moreover, if (m, r) is ncd, then there are pairwise different ci, . . . , c„ G Cp^^ 
such that r = (f xi Cl . . . Cn) and ci, . . . , c„ do not occur in right-hand sides of 
m. Thus, r' = {sub (/' Xi) Ci . . . Cn) and by the definition of dec , Ci, . . . , c„ are 
not introduced into right-hand sides of mi. Hence, (mi,TO 2 ,r') is ncd, too. □ 

Example 6 Decomposition translates the 1-mtt {maccjXacc) of Pacc into the 
hsmodtt {mdecj',mdec,sub,rdec) of pdec, which are both ncd, cf. Ex. 4. □ 

However, we have not yet improved the automatic verifiability of programs: 

Example 7 Let {mdecj> ,'mdec,sub,'fdec) be the hsmodtt of p dec created by de- 
composition and resume the proof attempt from the introduction. Since the 
initial call has changed from (/ xi 0 []) to {sub (/' xi) 0 []), we have to prove 
{sub (/' xi) 0 []) = (g a;i) by induction. Again, the automatic proof fails, be- 
cause in the induction step {xi i-G- {S xi)) the induction hypothesis cannot be 
successfully applied to prove {sub (/' {S a;i)) 0 []) = {q {S xi)). The problem 
is that the context arguments of {sub (/' a;i) {S {S tti)) (tti : 7T2)), which origi- 
nates as subterm from rule application to {sub (/' {S xi)) 0 []), do not fit to the 
context arguments of the term {sub (/' Xi) 0 []) in the induction hypothesis. □ 

3.2 Constructor Replacement 

We solve the above problem by avoiding applications of substitution functions 
(with specific context arguments like 0 and [] in Ex. 7) in initial calls. Since 
then an initial call consists only of a unary function, induction hypotheses can 
be applied without paying attention to context arguments. The idea, illustrated 




Deaccumulation ~ Improving Provability 155 



on Ex. 7, is to replace the substitution constructors tti and 7T2 by 0 and [ ] from 
the initial call. Thus, the initial values of sub's context arguments are encoded 
into the program and the substitution in the initial call becomes superfluous. 

We restrict ourselves to 1-mtts that are ncd. Then, after decomposition, the 
initial calls have the form {sub (/ x\) ci...c„), where ci,...,c„ are pairwise 
different. Thus, when replacing each -Kj by Cj, there is a unique correspondence 
between the nullary constructors ci , . . . , c„ and the substitution constructors 
7Ti, . . . , 7T„. In Ex. 10 we will demonstrate the problems with identical ci, . . . , c„. 

When replacing tt^- by Cj, the constructors Ci, . . . , c„ now have two roles: If 
Cj occurs within a first argument of sub, then it acts like the former substitution 
constructor 7Tj, i.e., it will be substituted by the j-th context argument of sub. 
Thus, sub now has the defining equation sub Cj yi ■ ■ - yn = Vj- Only occurrences 
of Cj outside of sub's first argument are left unchanged, i.e., here the constructor 
Cj stands for its original value. To make sure that there is no conflict between 
these two roles of Cj, we again need the ncd-condition. It ensures that originally, 
Cj did not occur in right-hand sides of f's definition. Then the only occurrence 
of Cj, which does not stand for the substitution constructor Hj, is as context 
argument of sub in the initial call. This substitution, however, can be omitted, 
because the call {sub (/ a;i) ci . . . c„) would now just mean to replace every Cj 
in (/ xi) by Cj. In this way, the resulting hsmodtt is initial value free (ivf). 

Lemma 8 Let p G P and {mi, m 2 , r) be an hsmodtt of p as constructed in the 
transformation of Lemma 5. Moreover, let (toi, m 2 , r) be ncd and tti, . . . , 7t„ be 
its substitution constructors. Then, there are p' G P and an hsmodtt (m'^, m^, r') 
of p' that is ivf, such that for all t G 7cp_{,n,...,7r„}: nfp{r[xi/t]) = nfp>{r'[xi/t]). 

Proof. We construct p' G P hy replacing mi and m 2 in p by modules m\ and 
m' 2 , and we define r'. Let / = fm^ G Fp^\ sub = fm^ G and ci, . . . , c„ G 

Cp^^ — {tti, . . . , 7r„} be pairwise distinct, such that r = {sub (/ xi) ci . . . c„) and 
Cl, . . . , c„ do not occur in right-hand sides of mi. Let Cp' = Cp — {tti, . . . ,tt„}. 

1. For every c G and for every equation / (c Xi . . . Xk) = rhspj^c in 

mi, the module m[ contains / (c Xi ...Xk) = repU rhsp j r), where reyl : 
RHS{f, {Cp-{ci,..., c„}) U Xk,Yn) -G RHS{f, Cp, U Xfc, Yo) 

replaces every occurrence of iVj by Cj , for all j G [n] . 

2. m '2 contains the equations 

sub {cxi . . .Xk) yi . . .yn = c {sub xi yi . . .yn). . . {sub Xk yi . . ■ yn) , 

for all c G - {ci, . . . , c„} 

sub Cj yi ■ ■ - Vn = yj , for all j G [n] 

3. r' = f xi. 

Note that {m'i,m' 2 ,r') is an hsmodtt of p' that is ivf. For every t G Tq^,, we 
have nfp{r[xi/t\) = nfp,{r'[xi/t]). For the proof of this statement, the following 
statements (*) and (**) are proved by simultaneous induction. For space reasons 
we omit this proof. 
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(*) For every t G Tc^, and si, . . . , s„ G Tc ^, : 
nfp{sub (/ t) si...s„) = nfp'{sub (/ t) si . . .s„). 

(**) For every /c G N, ti, . . . , tfe G Tc^, , 

f G RHS{f, {Cp - {ci, . . . ,c„}) U {sub},Xk,Yo), and Si, . . . ,s„ G Tq^,: 
nfp{sub {f[xj/tj\) si...Sn) = nfp'{sub (revl(r) [xj/tj]) □ 



Example 9 Constructor replacement translates the ncd hsmodtt {nidecj', 
m dec, sub, r dec) oi pdec into the ivf hsmodtt {mnon,f',mnon,sub,r„on) oipnon- Es- 
sentially, all occurrences of tti and 7 T 2 are replaced by 0 and [ ] . □ 

Now we demonstrate the problems with hsmodtts violating the condition ncd: 

Example 10 Assume that Pacc and race are changed into the following program: 

/ (-S' xi) yiV2 = f xi {S {S yi)) {yi + j/2) 

/ 0 yiV2 = V2 

and the initial call (/ x\ 0 0 ), computing the sum of the first x\ even num- 
bers. Now the same constructor 0 occurs in the initial values for both context 
arguments. Decomposition delivers the program^: 

/' (S' Xi) = sub (/' Xi) (S (S 7 Ti)) (tti -h 7 T 2 ) 

/' 0 = 7 T 2 

sub {xi + X2) yi 2/2 = {sub xi 2/1 2/2) + {sub X2 2/1 2/2) sub tti 2/1 2/2 = 2 /i 
sub {S a:i) 2/1 2/2 = S {sub xi 2/1 2/2) sub tt2 2/1 2/2 = 2/2 

sub 0 2/12/2 = 0 

and initial call {sub (/' a:i) 0 0 ). Constructor replacement would replace tti and 
7 T 2 by 0 , which leads to different rules sub 0 2/1 2/2 = 2 /i ^nd sub 0 2/1 2/2 = 2/2 with 
same left-hand side. In Sect. 5 we give an idea how to overcome this problem. □ 

We conclude this section with some statements about substitution functions 
which are often helpful for the verification of transformed programs (cf. the 
examples in Sect. 4 and App. A and B). Instead of proving these statements 
during verification, they should be generated during program transformation. 
This is possible because the substitution functions only depend on the set of 
constructors but not on the transformed function. 

Lemma 11 Let p € P and {mi,m2,r) be an hsmodtt of p with substitution 
constructors ci, . . . , c„ and substitution function sub. 

1 . Asub (Associativity of sub). For every ...,tn, si, ..., Sn G we have 

nfp{sub {sub to h...tn) si...Sn) = nfp{sub to {sub h si...s„)...(sm& t„ si...s„)). 

2 . Usub (Right Units of sub). For every t G we have nfp{sub t C\ . . . c„) = t. 

3 . +stifo (Addition by sub). If n = 1 , Cp = {S', 0 }, and nfp{{S^^ 0 ) -I- (S^^ 0 )) = 

0 for all Z\,Z2 G N, then nfp{sub s t) = nfp{s + 1 ) for all s, t G Tc^. 

Proof. The proofs are straightforward inductions on Tc^, and N, respectively. □ 
^ During the transformation, -|- is treated as an ordinary binary constructor. 
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4 Related Work 

Program transformations are a well-established field in software engineering and 
compiler construction (see, e.g., [2,6,21,22]). However, we suggested a novel ap- 
plication area for program transformations by applying them in order to increase 
verifiability. This goal is often in contrast to the classical aim of increasing effi- 
ciency, since a more efficient program is usually harder to verify. In particular, 
while composition results from the theory of tree transducers are usually applied 
in order to improve the efficiency of functional programs (cf., e.g., [18,19,24,25]), 
we have demonstrated that also the corresponding decomposition results are not 
only of theoretical interest. 

Program transformations that improve verifiability have rarely been inves- 
tigated before. A first step into this direction was taken in [12]. There, two 
transformations were presented that can remove accumulators. They are based 
on the associativity and commutativity of auxiliary functions like -I- occurring 
in accumulator arguments. The advantage of the approach in [12] is that it does 
not require the strict syntactic restrictions of 1-mtts that are ncd. Moreover, 
[12] does not require that functions from other modules may not be called in 
right-hand sides. Because of that restriction, in the present paper, we have to 
treat all auxiliary functions like -I- as constructors and exclude the use of any 
information about these functions during the transformation. 

On the other hand, the technique of [12] can essentially only remove one 
accumulator argument (e.g., in contrast to our method, it cannot eliminate both 
accumulators of Pacc)- Moreover, the approach in [12] relies on knowledge about 
auxiliary functions like -I-. Hence, it is not applicable if the context of accumu- 
lator arguments on the right-hand side is not associative or commutative. Thus, 
it fails on examples like the following program Pexp- In particular, this demon- 
strates that in contrast to [12], our technique can also handle nested recursion. 
Indeed, deaccumulation is useful for functional programs in general — not just 
for functions resulting from translating imperative programs. 

exp {S Xi) yi = exp Xi (exp Xi yi) 
exp 0 yi = S yi 

The initial call is (exp Xi 0). We want to prove (exp Xi 0) = (e Xi), where 
(e (S'” 0)) computes (S^” 0), see below. Here, (S^^ 0) -I- (S^^ 0) computes 

0. 

e (S xi) = (e xi) + (e xi) 
eO = SO 

Since exp is a 1-mtt that is ncd, deaccumulation delivers the program: 

exp' (S xi) = sub (exp' x\) (sub (exp' x\) 0) sub (S x\) y\ = S (sub x\ yi) 
exp' 0 = S 0 sub 0 yi = yi 

and the initial call (exp' x\), which are better suited for induction provers, 
because there are no accumulating arguments anymore. For instance, instead of 
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proving (exp xi 0) = (e xi) for the original program (which requires a generaliza- 
tion), now the statement (exp' xi) = (e a:i) (taken as induction hypothesis IH) 
can be proved automatically. We only show the induction step {x\ i — {S xi)). 



exp' {S x\) 



sub (exp' a;i) (sub (exp' x\) 0) 
sub (e xi) (sub (e xi) 0) (2 * IH) 

sub (e xi) (e xi) (Usub) 

(e xi) + (e xi) (+sub) 

e (S Xi) 



While in many examples generalizations can be avoided by our technique, it 
does not render generalization techniques superfluous. There exist accumulative 
functions where our transformation is not applicable, cf. Ex. 10^, and even if it is 
applicable, there may still be conjectures that can only be proved via a suitable 
generalization. However, even then our transformation is advantageous, because 
the generalizations for the transformed functions are usually much easier than 
the ones required for the original accumulative functions (cf. App. A). 



5 Conclusion and Future Work 

Imperative programs and accumulative functional programs resulting from their 
translation are hard to verify with induction provers. Therefore, we introduced an 
automatic technique that transforms accumulative functions into non-accumula- 
tive functions, whose verification is often significantly easier with existing proof 
tools. However, it remains to characterize (at least informally) the class of veri- 
fication problems, for which there is a real improvement. 

To increase the applicability of our approach, we plan to extend it to more 
general forms of algorithms. For example, the requirement ncd should be weak- 
ened, such that examples with equal constructors in initial calls can be handled 
as well. The idea is to use different substitution functions such that at every node 
of a tree it can be read from the substitution function, how a nullary constructor 
has to be substituted. To this end, one must analyze the decomposed program 
prior to constructor replacement to And out which substitution constructors can 
occur in which contexts. For instance, in Ex. 10 it can be shown that tti can 
only occur in a left subtree of a -I-, whereas 7T2 cannot occur in such positions. 
Thus, in the program after constructor replacement every occurrence of a 0 in 
a left subtree of a -I- must be substituted by y\, whereas all other occurrences 
must be substituted by j/ 2 - 

An extension beyond mtts seems to be possible as well. For example, the re- 
quirement of flat patterns on left-hand sides may be relaxed. Moreover, one could 
consider different constructor terms instead of nullary constructors in initial calls. 
Further extensions include a decomposition that only removes those context 
arguments from a function that are modified in recursive calls. Finally, we also 
investigate how to incorporate the transformations of [12] into our approach. 

® Note that for this example, however, one can construct an equivalent non-accumu- 
lative program, cf. Sect. 5. 




Deaccumulation - Improving Provability 159 



References 

1. S. Autexier, D. Hutter, H. Mantel, and A. Schairer. Inka 5.0 - A logical voyager. 
In Proc. CADE-16, LNAI 1632, pages 207-211, 1999. 

2. F. L. Bauer and H. Wossner. Algorithmic Language and Program Development. 
Springer- Verlag, 1982. 

3. A. Bouhoula and M. Rusinowitch. Implicit induction in conditional theories. Jour- 
nal of Automated Reasoning, 14:189-235, 1995. 

4. R. S. Boyer and J S. Moore. A Computational Logic. Academic Press, 1979. 

5. A. Bundy, A. Stevens, F. van Harmelen, A. Ireland, and A. Smaill. Rippling: A 
heuristic for guiding inductive proofs. Artificial Intelligence, 63:185-253, 1993. 

6. R. M. Burstall and J. Darlington. A transformation system for developing recursive 
programs. Journal of the ACM, 24:44-67, 1977. 

7. E. W. Dijkstra. Invariance and non-determinacy. In Mathematical Logic and Pro- 
gramming Languages, chapter 9, pages 157-165. Prentice-Hall, 1985. 

8. J. Engelfriet. Some open questions and recent results on tree transducers and tree 
languages. In R. V. Book (ed.), Formal language theory; perspectives and open 
problems, pages 241-286. Academic Press, 1980. 

9. J. Engelfriet and H. Vogler. Macro tree transducers. JCSS, 31:71-145, 1985. 

10. J. Engelfriet and H. Vogler. Modular tree transducers. TCS, 78:267-304, 1991. 

11. Z. Fiilop and H. Vogler. Syntax- directed semantics — Formal models based on tree 
transducers. Monographs in Theoretical Comp. Science, EATCS. Springer, 1998. 

12. J. Giesl. Context-moving transformations for function verification. In Proe. LOP- 
STR’99, LNCS 1817, pages 293-312, 2000. 

13. C. A. R. Hoare. An axiomatic basis for computer programming. Communications 
of the ACM, 12:576-583, 1969. 

14. A. Ireland and A. Bundy. Automatic verification of functions with accumulating 
parameters. Journal of Functional Programming, 9:225-245, 1999. 

15. A. Ireland and J. Stark. On the automatic discovery of loop invariants. In fth 
NASA Langley Formal Methods Workshop. NASA Conf. Publication 3356, 1997. 

16. D. Kapur and H. Zhang. An overview of rewrite rule laboratory (RRL). Journal 
of Computer and Mathematics with Applications, 29:91-114, 1995. 

17. M. Kaufmann, P. Manolios, and J. S. Moore. Computer-Aided Reasoning: An 
Approach. Kluwer, 2000. 

18. A. Kiihnemann. Benefits of tree transducers for optimizing functional programs. 
In Proc. FST &lTCS’98, LNCS 1530, pages 146-157, 1998. 

19. A. Kiihnemann, R. Gliick, K. Kakehi. Relating accumulative and non-accumulative 
functional programs. In Proc. RTA’Ol, LNCS 2051, pages 154-168, 2001. 

20. J. McCarthy. Recursive functions of symbolic expressions and their computation 
by machine. Communications of the ACM, 3:184-195, 1960. 

21. H. Partsch. Specification and Transformation of Programs. Springer- Verlag, 1990. 

22. A. Pettorossi and M. Proietti. Rules and strategies for transforming functional 
and logic programs. ACM Computing Surveys, 28:360-414, 1996. 

23. J. Stark and A. Ireland. Invariant discovery via failed proof attempts. In Proc. 
LOPSTR’98, LNCS 1559, pages 271-288, 1998. 

24. J. Voigtlander. Conditions for efficiency improvement by tree transducer compo- 
sition. In Proc. RTA ’02, LNCS 2378, pages 222-236, 2002. 

25. J. Voigtlander and A. Kiihnemann. Composition of functions with accumulating 
parameters. To appear in Journal of Functional Programming, 2004. 

26. C. Walther. Mathematical induction. In Gabbay, Hogger, Robinson (eds.). Hand- 
book of Logic in AI & Logic Prog., Vol. 2, 127-228. Oxford University Press, 1994. 




160 



Jurgen Giesl, Armin Kuhnemann, and Janis Voigtlander 



A Example: Splitting Monadic Trees 

The program split {A xi) yi = A {split X\ yi) 
split {B xi) yi = split xi {B yi) 
split N 2/1 = 1/1 

with initial call {split xi N) translates monadic trees with ni and ri2 occurrences 
of the unary constructors A and B, respectively, into the tree A^'^{B'^^N) by 
accumulating the B's in the context argument of split. It is transformed into: 

split' {A x\) = A {sub {split' x\) N) sub {A x\) y\ = A {sub xi yi) 

split' {B xi) = sub {split' xi) {B N) sub {B xi) yi = B {sub Xi yi) 

split' N = N sub N y^ = y^ 

with initial call {split' xi). If we want to prove the idempotence of the split- 

ting operation, then the proof for the original program requires a generalization 
from {split {split xi N) N) = {split xi N) to {split {split x\ {b X2)) {b X3)) = 
{split xi {b {x2 + 2:3))) , where {b n) computes (B" N). Such a generalization is 
difficult to find. On the other hand, {split' {split' a:i)) = {split' Xi) can be proved 
automatically. In the step case (a;i i-/ {A xi)), Usub from Lemma II is used to 
infer {sub {split' Xi) N) = {split' xi). In the step case (xi i-/ {B xi)), a straight- 
forward generalization step is required by identifying two common subexpres- 
sions in a proof subgoal. More precisely, by applying the induction hypothesis, 
the induction conclusion is transformed into {split' {sub {split' Xi) {B N))) = 
{sub {split' {split' xi)) {B N)). Now, the two underlined occurrences of {split' xi) 
are generalized to a fresh variable x, and then the proof works by induction on x. 

B Example: Reversing Monadic Trees 

The program rev {A xi) y\ = rev xi {A 1/1) 
rev {B xi) y\ = rev xi {B yi) 
rev N yi = yi 

with initial call {rev xi N) is transformed into the program 

rev' {A xi) = sub {rev' xi) {A N) sub {A x\) y\ = A {sub xi y\) 

rev' {B xi) = sub {rev' xi) {B N) sub {B xi) y\ = B {sub xi 1/1) 

rev' N = N sub N 2 /i = 2 /i 

with initial call {rev' xi). Taking into account that sub is just the concatenation 
function on monadic trees, the above programs correspond to the efficient and 
the inefficient reverse function, which have linear and quadratic time-complexity 
in the size of the input tree, respectively. Thus, this example shows that the aim 
of our technique contrasts with the aim of classical program transformations, 
i.e., the efficiency is decreased, but the suitability for verification is improved: If 
we want to show that the reverse of two concatenated lists is the concatenation 
of the reversed lists in exchanged order, then the proof of {rev {sub Xi X2) N) = 
{sub {rev X2 N) {rev x\ N)) again requires considerable generalization effort, 
whereas {rev' {sub xi X2)) = {sub {rev' X2) {rev' xi)) can be proved by a 
straightforward induction on xi, exploiting Usub and Agub from Lemma 11 . 
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Abstract. In this paper, we study incentive compatible mechanism 
based on linear pricing scheme for single-minded auction, a restricted 
case of combinatorial auction, in which each buyer desires a unique fixed 
bundle of various commodities, improving the previous works [1,11,13] 
on pricing bundles {i.e., payments of buyers). 



1 Introduction 

With the rapid growth of electronic commerce, the interplay of several important 
economic concepts, such as Game Theory [16], General Equilibrium [6,7], Mech- 
anism Design [12,16] and Auction Theory [10], and Gomputer Science [5,15,17] 
have become more and more intensive, extensive, and fruitful. The practice of 
electronic markets allows businesses, merchants and consumers of all types to 
conduct over the Internet at lightening fast speed across the globe. The extraor- 
dinary power of on-line trading systems in aggregating information of sellers and 
buyers makes it possible to conduct business in varieties of trading models that 
were only theoretical possibilities before the age of the Internet. 

Model of Combinatorial Auction 

Combinatorial auction [14,4] is one such model that has been examined with in- 
creasing intensity in the recent years. Buyers entering combinatorial auction are 
usually assumed to have a function each over all subsets of the traded commodi- 
ties, rating the different values of different subsets of different commodities to 
them. Two schemes for pricing the commodities are often used: one gives a price 
to each subset of commodities {i.e., payment per buyer), and another to each 
type of commodities where the price of a subset is derived by the sum of prices 
of commodities in the subset {i.e., price per commodity). In the former case, the 
pricing function is often required to be subadditive: p{BCC) < p{B)+p{C). The 
latter is often referred to as the linear pricing model since p{BCC) = p{B)+p{C) 
if B and C are disjoint. 
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Even though the latter can be viewed as a special case of the former, in 
practice, it is more widely applicable. Combinatorial auction involves in sells of 
different types of commodities at one business transaction. In reality, it is rare 
that they would all be available with one owner. Normally, each commodity has 
its own independent market and linear pricing scheme for commodities will be 
a natural model under such circumstance. This will be the focus of our study. 

One may argue that, by its very nature, combinatorial auction requires a 
pricing mechanism for a bundle of commodities. Non-linearity pricing is thus 
an inherent nature of the problem. We acknowledge that indeed such cases do 
exist. However, the linear pricing model for commodities is also a practically 
useful model. Very often commodities of a bundle cannot be purchased at once 
with one merchant and each has to be acquired at a market of its own. Though 
we may not rule out such possibilities in all cases, there are cases one may 
have a problem with anti-trust law as in the recent case of US vs. Microsoft 
involving in software packages. As another example, one may observe that travel 
packages including air-ticket, rooms and cars often get a discount from individual 
product providers (airlines, hotels, car-rental companies). Those products can be 
viewed as traded through a separate market (and are indeed so in that many 
airlines/hotels/car-rental companies set aside a block of their products for such 
deals). Therefore, the linear pricing model is at least as important as non-linear 
pricing model in combinatorial auction. 

Single-Minded Auction 

As a restricted case of combinatorial auction, single-minded auction, which speci- 
fies that each buyer only desires a fixed bundle of commodities, rather than all the 
possible combinations, has received more and more attentions recently. Lehmann 
et al. [11] first introduced the notion of single-minded auction. Mu’alem and 
Nisan [13] studied a set of techniques that allow designing efficiently computable 
incentive compatible mechanisms for single-minded auction. Further, Archer et 
al. [1] showed an incentive compatible mechanism by randomized rounding that 
achieves (1 -I- e) approximation ratio to the value of optimal allocation. Note 
that in general, the optimal allocation problem (maximizing total valuations) is 
AP-hard [11]. In addition, Chen et al. [2] studied the complexity of the existence 
of Walrasian equilibrium in single-minded auction. 

Note that all incentive compatible mechanisms previously designed are based 
on the payment per buyer, rather than the price per commodity. Therefore, in 
this paper, we follow their basic model of single-minded auction but focus on 
the linear pricing scheme. 

First, we consider the problem that how to interpret the payments of buyers 
as the linear prices of commodities. Specifically, for the general optimal allocation 
algorithm and greedy allocation algorithms [13], we discuss the criteria that such 
interpretation exists or not. Secondly, We propose a mechanism that ensures 
incentive compatibility and sets price for each individual commodity. The price 
of a bundle thus follows as the sum of prices of commodities in the bundle. In 
addition. We compare the revenue generated by our mechanism to that generated 
by GPS (Greedy Payment Scheme) [11] and VGG mechanism [18,3,8], all are 




Incentive Compatible Mechanism Based on Linear Pricing Scheme 



163 



incentive compatible. And observe that the revenue generated by our mechanism 
is higher than or equal to the other two in all the cases originally discussed in 
[11]. Moreover, we demonstrate the problem of maximizing the revenue of the 
auctioneer for the case of regular price is fVP-hard. 

Outline of the Paper 

In section 2, we briefly view the definitions of single-minded auction and the 
characterization of incentive compatible mechanisms proposed in [13]. Next, we 
discuss some criteria of the interpretation of buyers’ payments as the linear price 
vector. In section 4, an incentive compatible mechanism based on linear pricing 
scheme is studied and revenue problem is considered. We conclude our work in 
section 5 with remarks and future directions. 

2 Preliminaries 

2.1 Single-Minded Auction 

We consider the model that an auctioneer sells m heterogeneous commodi- 
ties 17 = {uji, . . . with unit quantity each, to n potential buyers O = 

{Oi , . . . , On}- Each buyer Oi has a privately known valuation function Vi : 2^ ^ 
K+ U {0} that describes his true values over the various subsets of commodities. 
That is, for any B Q f2, Vi{B) is the maximal amount of money that Oi is willing 
to pay in order to win B. We say Oi is a single-minded buyer if there exists a 
basic bundle Qi = {cj}, ■ ■ ■ , C 17, where qi is the number of various com- 
modities in l7i, and a real v* > 0 such that for each B C 17, Vi{B) = v* if 17^ C 
and Vi{B) = 0 otherwise. That is, basic bundle 17^ is the core that Oi desires. In 
this paper, we assume that all buyers are restricted to be single-minded, and the 
auctioneer knows all the basic bundles 17^ in advance (this assumption is just for 
the simplicity of our statements). We denote A = (17; I7i, ui; . . . ; 17„, v„) as the 
single-minded auction [11]. 

We consider direct revelation auction mechanisms. That is, each buyer Oi 
submits a bid bi to the auctioneer. Note that in single-minded auction, each 
buyer only needs to submit the value 6^(17^) to the auctioneer. Unless stated 
otherwise, we denote Vi as the value Vi{f2i) = v*, and bi as the value 6^(17^) in 
the following discussions. 

When receiving the bids b = (6i,...,&„) from all buyers, the auctioneer 
specifles the tuple (X{b), Px{b)), in which 

— X{b) = {Xi , . . . , Xn) is an allocation of 17 to all buyers, where Xi represents 
the collection of commodities allocated to Oi, and Xi fl Aj = 0 for all i ^ j- 
Assume without loss of generality that = 0 or 17^. 

~ Px{b) = {Pi , . . . , Pn) is the payment vector of all buyers, where Pi represents 
the payment that buyer Oi pays to the auctioneer. 

Note that the payment vector Px is also a function of the allocation A, which is 
changed in terms of different allocation algorithms. In this paper, we assume that 
Px satisfies the following voluntary participation condition, given the allocation 
A(6) = (Ai,...,A„), 
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~ Pi < hi, for all Xi = 17^. (In this case, the utility of Oi is defined by 
Ui{Xi, Pi) = Vi{Xi) - Pi). 

— Pi = 0, for all Xi = 0. (In this case, the utility of Oi is zero). 

Note that to maximize the utilities, buyers may not submit their valuations 
truthfully. The strategy is determined in terms of various mechanisms. We say 
an auction mechanism is incentive compatible (or truthful) if any buyer’s utility 
is maximized by submitting his true valuation, be., bi = Vi, for 1 < t < n. 

2.2 Characterization of Incentive Compatible Mechanisms 

Let b-i denote the tuple (&i, . . . , bi-i, 6i+i, . . . , bn), and (&-i, bi) as (6i, . . . , bn). 
Given b-i, if Oi wins his basic bundle Qi when bidding bi, we say bi is a winning 
declaration for Oi. Otherwise, we say hi is a losing declaration. 

Definition 1 An allocation algorithm X is monotone if for any given bids h-i 
and a winning declaration hi, any higher declaration > bi still wins. 

Lemma 1 [13] Let X he a monotone allocation algorithm. Then for any b_i, 
there exists a single critical value Ci{X, b-i) G K+ such that far'd bi > Ci{X, h-i), 
bi is a winning declaration, and far'd bi < Ci{X,b-i), bi is a losing declaration. 

Definition 2 The critical payment Px{b) associated with the monotone alloca- 
tion algorithm X is defined by Pi = Ci{X, b-i) if Xi = Qi; and Pi = 0 otherwise. 

Lemma 2 Critical payment Px{b) satisfies the volunteer participation condi- 
tion. 

Proof. The case of Tfj = 0 follows from the definition of critical payment directly. 
As to the other part, Xi = Qi, notice that buyer Oi wins his basic bundle, due 
to Lemma 1, we know that bi > Ci{X, b-i) = Pi. □ 

The most important property of monotone allocation algorithm and critical 
payment is the following theorem showed by Mu’alem and Nisan [13]. 

Theorem 1 [13] A mechanism is incentive compatible if and only if its alloca- 
tion algorithm is monotone and its payment is critical payment. 

In the rest of the paper, all our discussions are referred to monotone allocation 
algorithm associated with the critical payment. Therefore, all buyers bid their 
valuations truthfully, i.e., bi = Vi. Thus, for simplicity, the submitted bid of each 
buyer is also denoted hy Vi, i = 1, ... ,n. 

3 Interpretation of Payments as Prices 

Mu’alem and Nisan [13] studied a set of techniques on the basis of Theorem 1 to 
design efficiently computable incentive compatible mechanisms for single-minded 
auction. Their mechanisms, however, are all based on the payment per buyer, 
rather than the price per commodity. The following example demonstrates this 
point clearly. 




Incentive Compatible Mechanism Based on Linear Pricing Scheme 



165 



Example 1 Three buyers bid for two commodities with basic bundles {wi}, {oJi, 
W 2 }, {^ 2 } at valuations 8, 10,8 respectively. According to the Greedy Allocation 
Algorithm based on the value ranking [13] (which we will discuss in detail fol- 
lowing), buyer O 2 wins at critical value 8. Unfortunately, in this setting, there 
do not exist the prices of commodities to support the above allocation. That is, 
no price vector {p{coi) , p{co 2 )) satisfies inequalities p{u)i) > 8, p{uji)+p{uj 2 ) < 10, 
and p{uJ 2 ) > 8 simultaneously. 

Therefore, we are interested in how to interpret the payments of buyers 
as the prices of commodities. Specifically, the interpreted price vector p = 
(p(wi), . . . ,p{ujm)) should satisfy the following linear pricing scheme, given the 
allocation X{b) = {Xi, . . . , X„) and payment Px{b) = {Pi, ■ ■ ■ , Pn), 

— p{BC) = p{B) + p{C), for any i3, C C 17, n C = 0, i.e., the price of 
commodity is linear. 

— p(l7i) = Pi, for all Xi = f2i. 

— p('Gi) > bi {i.e., Vi), for all Xi = 0. 

Note that the first condition above implies that p(0) = 0. Essentially, linear 
pricing scheme specifies that for any winner, his payment is equal to the corre- 
sponding bundle’s price. Whereas for all losers, the reason that they do not win 
their basic bundles is due to the high prices. 

Definition 3 Given allocation X{b) = (Ai,...,A„) and payment Px{b) = 
{Pi, . . . , Pn), we say Px{b) can be interpreted as the prices of commodities if 
there exists vector p = {p{toi), . . . ,p{ujni)) that satisfies the above linear pricing 
scheme. 

Example 1 shows that not all payments can be interpreted as the price, even 
it’s critical payment. In addition, as we will see following, the property of the 
interpretation of payments is also determined in terms of different allocation 
algorithms. 

3.1 Optimal Allocation 

In this subsection, we consider the optimal allocation algorithm OPT, that is, 
the algorithm outputs an allocation X* with maximal total valuations. Note 
that there may be several various allocations achieve the same maximal value. 
Following we consider the fixed one that OPT outputs, i.e., 

n 

X* G arg max Vi{X,) ■ 

i=l 

It’s easy to see that OPT is a monotone allocation algorithm. Let Ag = 17 — 
Ur=i collection of commodities that are not allocated to buyers. Let 

O' = {Oi I Xi = Oi} he the collection of winners, and O" = {Oi \ Aj = 0} be 
the collection of losers. 
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Lemma 3 If for any Oi G O" , Qi HXq ^ 0, then the critical payment Pqpt = 
(Pi, . . . , Pn) can he interpreted as the price p = (p(wi), . . . ,p(oJm))- 
Proof. Note that for any Oi,Oi> G O', we must have Hi fl I2ii = 0. Therefore, 
for all Oi G O', we may define p{u}\) = Pi, and p(w*) = 0, j = 2 ,. . . ,qi. For 
any Oi G O” , we know that there exists a commodity ujj G I2i D Xg that has 
not been set price yet. Therefore we may define a sufficiently large price to ujj 
such that p{ujj) > Vi. Note that if there exists another buyer Op G O" such that 
ujj G f2p HXq, then the defined price of coj should satisfy p{tOj) > max{wj, 1;^/}. 
Other commodities’ prices are determined arbitrarily. It’s easy to see that the 
prices defined above satisfy all the requirements of the linear pricing scheme. 
Hence the lemma follows. □ 

Following, we consider the problem of interpretation from the point of the 
view of Walrasian equilibrium. Intuitively, Walrasian equilibrium specifies the 
allocation and price vector such that any remained commodity is priced at zero 
and all buyers are satisfied with their corresponding allocations under the fixed 
price vector. Formally, 

Definition 4 (Walrasian Equilibrium) A Walrasian equilibrium of single- 
minded auction A is a tuple {X,p), where X is an allocation, p>Q is a linear 
price vector of all commodities, such that (i) p(f?\(Ur=i (^'^) Z®’’ 

Oi, Ui{Xi,p{Xi)) > Ui{B,p{B)) for all bundle B Q fl. 

From the conclusion of [9], we know that the allocation X in Walrasian 
equilibrium must be an optimal allocation. 

Lemma 4 For any single-minded auction A = {f}-, f2 i,vi, . . .-, f2n,Vn), if the 
Walrasian equilibrium (X*,p) exists and Vi = Ci{X* ,v-i) for all Oi G O', then 
the critical payment can he interpreted as the price. 

Proof. For any Oi G O" , from the definition of Walrasian equilibrium, we know 
that p{f2i) > Vi. Otherwise, Oi should be allocated with fli to get more utilities. 
For any winner Oi G O', we have p{f2i) < Vi = Ci{X* ,v-i) = Pi. Hence we 
may increase the price of any commodity in fli such that p{f2i) = Pi, which 
complies with the requirements of linear pricing scheme. In addition, note that 
fli n = 0 for any Oi, Op G O', therefore such increased price vector always 
exists. Hence the lemma follows. □ 

We stress that in the above lemma, both conditions (z.e., the existence of 
Walrasian equilibrium and Vi = Ci{X* ,v-i)) are necessary to maintain the 
property of interpretation. For example, if we remove the second condition, as 
shown in Example 1, there exists Walrasian equilibrium (X*,p), where X* = 
({wi}, 0, {W2}) and p = (5,5), but we can not interpret the critical payment 
Pi = P2 = 2, which is associated with the optimal allocation X* , as the price. 
Removing the first condition is similar, let’s look at the following example. 

Example 2 There are six buyers bid for three commodities. Buyer 01,02,0^ 
are interested in {101,102}, {1^2, W3}, {103,101} respectively at valuation 3 each. The 
other three buyers desire for {uoi}, {102}, {003} respectively at valuation 1 each. 
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In this example, the Walrasian equilibrium does not exist, and all buyers indeed 
bid their critical payments. However, we can not interpret the critical payment 

3. 1 of winners as the prices of commodities. 

3.2 Greedy Allocation Algorithms 

In the greedy allocation algorithms, the bids of buyers are first reordered ac- 
cording to some monotone ranking criteria, then the commodities are allocated 
greedily. 

Definition 5 A ranking r is a collection of n real valued continuous functions 
(ri, . . . ,r„), where ri = ri{vi, i?i) satisfies ri(0, f2i) = 0 and ri(oo, f2i) = oo. A 
ranking is monotone if each is non- decreasing in Vi- 

There are a number of ways to define the monotone ranking function r, 
for example [11,13], the value ranking ri{vi,f2i) = Vi, and the density ranking 

— |T77[ ’ 



Greedy Allocation Algorithm based on monotone ranking r [13] 

1. Winners -fr- 0, AvailComm ^ fl. 

2. Reorder the bids by non-increasing value of ri(vi, i7i). 

3. For i = 1, . . . , n (in the new order, ties are broken arbitrarily) 

if C AvailComm, then 

(i) . Winners -fr- Winners U {Oi}. 

(ii) . AvailComm <— AvailComm — Qi. 

4. Return Winners. 



Lemma 5 [13] Any Greedy Allocation Algorithm based on monotone ranking r 
is monotone. 

Theorem 2 For any Greedy Allocation Algorithm based on monotone ranking 
r, the critical payment can not be interpreted as the price. 

Proof. Our proof is constructive, that is, we show an instance of auction that 
such interpretation does not exist. Consider a fixed buyer Oq with basic bun- 
dle l7o, [Col > 1, and sufficiently large valuation vq. Let k = |f?o| and = 
{uJOi, ■ ■ ■ T<^0k}- We add another 2k buyers Oi, . . . ,Ok, Oy , . . . , Ok' and k com- 
modities uji, . . . ,u>k. That is, 

0 = {0o,0y...,0k,0y,...,0k'} , 

C = {w0i,...,W0fc,Wi,...,Wfe} . 

Let O = {Oi , . . . , Ok}. For any Oi G O, define the basic bundle 17^ = {wq^, w^}, 
and the valuation vi = ^ -|- 1. For buyer Oy, I < i < k, define Gy = {iVi}, 
and vy = e, where 0 < £ < 1 is a sufficiently small real, such that ry{e, Gy) < 
ri{vi,Gi) for all 1 < f < A:. Due to Definition 5, note that such real £ always 
exists. 
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Assume to the contrary, that in the above constructed instance, there exists 
price vector p = . . . ,p{u> 0 f,),p{u!i), . . . ,p{ujk)) that can be interpreted 

from the critical payment P = {Pq, Pi, . . . , Pk, Pv , ■ ■ ■ , Pk')- Following we con- 
sider the relations of rankings rg , ri , . . . , of buyers Oq,Oi, . . . , Ok respectively. 

Case 1. Oo wins, i.e., rg > for alH = 1, . . . , A:. In this setting, according to 
the greedy allocation algorithm, Oq, Oy, . . . , Ok' get their basic bundles. From 
the requirements of the linear pricing scheme, we must have p{f2i) > Vi for all 
1 < i < k. Note that 17 = lj^=i therefore, 

k k 

p{0) = '^p{Qi) > '^Vi = k- + 1) = VQ + k . (1) 

On the other hand, since all commodities are sold to buyers Oy ^ . . . , we 
have 

k k 

p{n) = p(Cg) + ^p(l7j/) = Po + '^Pi' < Vo + ke, (2) 

i=l 

which contradicts to (1). 

Case 2. At least one of 0\, . . . ,Ok wins, i.e., there exists I, I < I < k, such 
that ri > rg. Hence, Oi, . . . ,Ok win, and Oq does not get his basic bundle I7g. 
Therefore, we have 

p(Cg) > Vo . (3) 

For any Oi G O, i I, due to the property of the monotone ranking function 
r, we know that there exists sufficiently small real > 0 such that rj(£j, 17^) = 
ri/{e, f2i'), which implies that Si is the critical payment of Oi, i.e., Pi = e^. Let 
6 = max{£i, . . . ,ei-i,£i+i, . . . ,£fc}. Trivially, 6 is still a sufficiently small real. 
Therefore, we have 

< p{^) = Pi + ^ Pi < Y + ^ + (k - l)S < Vo , (4) 

where the last inequality is due to the sufficiently large value of ug. A contradic- 
tion. □ 

4 Incentive Compatible Mechanism Based on Prices 

In this section, we propose an incentive compatible mechanism whose critical 
payment can be interpreted as the price. 

4.1 Greedy Allocation Algorithm Based on Cost Ranking 

First, we assume that for each commodity tOj, there is a positive real d{uij), the 
cost of LVj to the auctioneer. Let d{ujj) = e ■ dj, j = 1, . . . ,m, where e is the 
greatest common divisor of d{u>i), . . . ,d{ujm). Similar as the price vector, here 
we assume the cost of commodity is linear, i.e., d{BC) = d{B) + d{C), for any 
B,C C Q, BnC = ^. 
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For each buyer Oi, we divide his submitted bid Vi according to the proportion 
of the costs of commodities w*. . That is, the ’’bid” of w* is calculated in 

terms of: 



= ■ 



d(^) 



d{w\) 



■dHi) 



j = 1, 






( 5 ) 



It’s easy to see that vi = Vi{u}\) + • • • + Vi(ujg.). The following example shows 
this process clearly. 



Example 3 In the following figure, there are three buyers bid for three com- 
modities with basic bundle {wa}, {uji,u; 2 }, {1^2} at value 15, 24, 14 respectively. 
For instance, the bid of buyer O 2 is divided as follows: since d{uji) / d{u 2 ) = 2/4, 
one third of the bid 24 is divided to uii, and others are divided to W2, be., 
^2(^1) = 8, 62(1^2) = 16, which is demonstrated in the middle table of the figure. 
The bids of other buyers are divided similarly. 
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+ 1 




24 


Buyer 3 




+ 1 




14 



Fig. 1. Dividing Process & Cost Ranking 



After dividing the submitted bids for all buyers, the auctioneer adds another 
virtual commodity ujq with cost le, and computes the “bid” Vi{uJo) in terms of 



V^{LVo) = Vi ■ 



d{uj\) ■ 



■ + d{ujL)’ 



i=l,. 



(6) 



Let cost ranking function = Ti{vi, fli) = Vi(ojo). In Example 3, the cost rank- 
ings of buyers are showed in the right table of Figure 1. 

The basic idea of the mechanism is from Vickrey auction [18], following we 
describe it in detail. 



Greedy Allocation Algorithm based on cost ranking: 

1 . Winners 0, AvailComm ^ f2. 

2. For i = 1, . . . ,n 

divide bid Vi, and compute the cost ranking r^. 

3. Sorting all rankings (wlog., assume ri > • • • > > 1). 

4. For i = 1, . . . , n 

if C AvailComm, then 

(i) . Winners ^ Winners U {Oi}. 

(ii) . AvailComm ^ AvailComm — Qi. 
else, stop and goto to the next step. 

5. Return Winners. 
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According to the mechanism above, in Example 3, Oi, O 2 get their basic bun- 
dles with critical payment Pi = 10.5, P2 = 21, P3 = 0, which can be interpreted 
as the price p(cc’i) = 7, _p(w2) = 14, and ^(ws) = 10.5. 

Lemma 6 The above Greedy Allocation Algorithm based on cost ranking is 
monotone. 

Proof. Note that the costs of all commodities are fixed, and according to (6), 
we know that ri{v[,fii) > ri{vi,Qi) for any v[ > Vi. That is, the cost ranking 
is monotone. Therefore, if Vi is a winning declaration, given other buyers’ bids 
V-i, v'l is also a winning declaration for Oi. □ 



Lemma 7 The critical payment associated with the Greedy Allocation Algorithm 
based on cost ranking can be interpreted as the price. 

Proof. Note that according to the algorithm, we have 

ri{vi,f2i) > r 2 {v 2 ,G 2 ) > ■ > r„(w„,l7„) > 1. 



First, we consider the case that buyers Oi, . . . ,Oi win their basic bundles, 
where 1 < I < n. For each ujj G Q, define p{tOj) = n+i • dj. 

Due to the critical payment scheme and the mechanism, for each 1 < i < ^, 
we know that ri{vi, l7i) > ri{Pi, I7j) = n+i, where Pi is the critical payment of 
buyer Oi, which implies that 

e 

* d{ujl) -\ 

Therefore, 

Pz = n+i- ^ ^ p{ujI)-\ = p{G,) . 

For any loser Oi, I < i < n, we know that ri(vi, 17^) < r;+i, i.e., 

e 

fl+l P Vi ■ -J- JT — — . 

d{u;\) + ■ ■ ■ + d{uj\.) 

Therefore, 



Vi < ri+i ■ 





p(lo\) + ■ ■ ■ + p{uj\.) = p{Qi) . 



That is, the linear pricing scheme holds. 

As to the case I = n, i.e., all buyers win the basic bundles, we may define the 
price of each commodity to be its cost, i.e., p{oJj) = d{ujj). Hence, the critical 
payment of each buyer is the cost of the corresponding basic bundle. 

From the above arguments, we know that such price vector, complying with 
the linear pricing scheme, that can be interpreted from the critical payment 
indeed exists. □ 



From the above two lemmas and Theorem 1, we have the following conclusion. 

Theorem 3 The Greedy Allocation Algorithm based on cost ranking associated 
with the critical payment is incentive compatible mechanism, and the payment 
can be interpreted as the price. 
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4.2 Revenue Consideration 

In the following we consider the revenue, r = auctioneer 

receives. Note that for our mechanism showed in the above subsection, r = 
Pi ~ 127=iPi^i)- Example 3, r = 31.5, according to our mechanism. 
Greedy Payment Scheme (GPS) [11] will allocate {^2} to O 3 at payment 12 and 
{ws} to Oi at payment zero, hence r = 12. In addition, in VGG mechanism 
[18,3,8], Oi pays zero for {ws} and O 2 pays 14 for bundle {wi, W2}, thus r = 14. 
Let’s look at another example following. 

Example 4 Assume there are three buyers and four commodities {wi, u> 2 , W3, W4}. 
0 \ desires bundle {loi,lo 2 } at 20, O 2 is interested in {0^3} at 8, and O3 desires 
{u;i,a'4} at 12. Assume the cost of each commodity is constant one. Our mech- 
anism allocates {u;i,u;2} to Oi and {W3} to O 2 at price 6 for each commodity, 
hence r = 18. The allocation under GPS and VGG is same as above, with the 
revenue r = 12 and r = 4 respectively. 

Note that the price generated by our mechanism has the regular property 
p{u)i) : ■■■ : p{oJn) = d{uji) : ••• : As we will see following, if we do 

not consider the utilities of buyers, even under this restriction, the problem of 
maximizing the revenue of the auctioneer in single-minded auction is AiP-hard. 

Definition 6 (Maximal Revenue Problem) Given constant k>0 and single- 
minded auction, does there exist regular price vector and trading buyers such that 
the revenue of the auctioneer is at least k. 

Theorem 4 Maximal Revenue Problem is NP-hard. 

Proof. We reduce from EXAGT GOVER problem: Given a finite set N = 
{1, . . . , m} and a family of subsets S = {si, . . . , s„} of N, we are asked that 
whether there exists a subset S' C S, such that every element of N lies in 
exactly one element of S'? 

We construct the following instance of single-minded auction, in which there 
are m commodities and n buyers. Let j G N corresponds to commodity LVj with 
unit cost each. For any Si C S, we denote Si as the basic bundle of buyer Oi, with 
valuation Vi{si) = ^ • [si], where A: > 0 is any constant. Therefore it’s easy to see 
that there exists a subset S' Q S such that every element of N lies in exactly 
one element of S' if and only if there exist a subset of buyers O' C O such that 
every commodity is sold to exactly one buyer in O' and all commodities are sold 
out, and equivalently, the auctioneer has revenue ^ 

5 Conclusion and Further Research 

In this paper. We study incentive compatible mechanism for the model of single- 
minded auction, a restricted case of combinatorial auction. Our work is focused 
on the interpretation of payments of buyers as the prices of commodities, and the 
design of incentive compatible mechanism based on linear pricing scheme. We 
believe that our work is an important step in the quest for incentive compatible 
mechanisms based on linear pricing scheme in the general combinatorial auctions. 
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Abstract. The hierarchical process structure of Petri nets can be mod- 
elled by languages of series-parallel posets. We show how to extract this 
structure from a 1-safe Petri net. The technique also applies to repre- 
sent 1-safe S-systems [11] and communication- free systems [5] in terms of 
structured programs with cobegin-coend. We also define SR-systems, a 
class of 1-safe Petri nets which exactly represents programs of this kind. 



Let A be a finite nonempty alphabet. A finite-state automaton over A is an ab- 
stract version of flowcharts — think of the letters of the alphabet as representing 
assignments and tests. Rational (or regular) expressions are an abstract version 
of the control mechanisms of a “structured” programming language. Kleene’s 
theorem thus ties together flowchart and program construct representations of 
a sequential programming language. (By using the power of assignments, the 
Bohm-Jacopini theorem shows that one loop is sufficient.) 

1-safe Petri nets [16] can be thought of as representing the flowchart structure 
of a concurrent program. The state of the program is distributed into a set of 
places and the transitions can be forking (with more than one target place) 
or joining (with more than one source place) or both. These can be used to 
model the hierarchical structure of processes as well as their interaction with 
each other. One can seek therefore a Kleene theorem which represents a Petri 
net as a program in an abstract concurrent programming language. 

Earlier Work. Several authors have defined expressions for nets. The earliest 
work we know of is by Grabowski [10] who defined expressions for poset lan- 
guages of 1-safe net systems. (In this paper, we only deal with posets which are 
finite and labelled — these are also known as pomsets [17].) Garg and Ragunath 
[6] defined “concurrent regular expressions” to describe word languages of Petri 
nets. The Petri Box Galculus (PBG) [1] is a rich formalism for describing con- 
current and branching behaviour of Petri nets, following much work in process 
algebra [9,8,20,3,15]. 

* The first author acknowledges partial support from the Indo-French project IFC- 
PAR/CEFIPRA 2102-1 for presenting a first version of the ideas in this paper in the 
Workshop on Logic and Algebra in Concnrrency, Dresden, 2000. 
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A disadvantage of Grabowski’s “Kleene-like” theorem (as also that of Garg 
and Ragunath) is that the “expressions” allow renaming and hiding operations. 
Hiding is renaming to an empty poset, and can be represented as manipulation 
of a local variable in a programming language. For example, the process algebra 
community usually models the sequential composition of two programs E1;E2 
by using a shared variable (called tick below) and synchronization so that the 
poset corresponding to a run of El precedes that corresponding the run of E2. 
new shared var tick in 

cobegin El (* has assignments to tick to signal termination *) 

II E2 (* has tests of tick to signal starting *) 

coend 

This introduces parallelism in program structure where there is none in the 
semantics. In our view, shared variables are acceptable to represent inter-process 
communication, but they should not be introduced where no process structure is 
required. Similarly, iteration (looping) is a natural programming structure and 
simulating it by prefixing and recursion is not acceptable. 

Ochmahski’s co-rational expressions [14] provide syntax for Mazurkiewicz 
trace languages [4], which are another popular way of describing poset behaviour 
of 1-safe Petri nets. Again, Ochmahski’s co-iteration does not correspond to a 
natural programming language construct. 

Our Approach. Whether the full class of 1-safe net behaviours can be repre- 
sented in terms of programming constructs remains to be seen. In this paper we 
model the forking and joining transitions which are used only for hierarchical 
ordering of processes by series-rational expressions [13], which can be thought 
of as while programs with a cobegin-coend construct. We do not deal with the 
representation of communication between processes. 

More precisely, we show that the series-parallel posets accepted by 1-safe nets 
form exactly the series-rational languages, and we define SR-systems, a class of 
1-safe Petri nets such that the posets they accept are exactly these languages. 

Other abstract ways of describing “finite-state” languages are by morphisms 
from the language of terms into a finite algebra, or by using a run as a model over 
which logical formulas can be evaluated. We use earlier work to show equivalence 
to this kind of algebraic recognizability and logical definability. 

Thus we have a Kleene-Myhill-Nerode-Biichi description. Renaming and es- 
pecially hiding are not easy operations to model in the algebraic and logical 
frameworks and this is a technical reason we avoid using them. 

One of the nice features about Petri nets is the structural characterizations 
that are available for some significant subclasses. Prominent among these are 
S-systems, T-systems and free choice (FG) systems [11]. 

We characterize two subclasses of nets which fall within the ambit of sp- 
languages: S-systems and communication-free (GF) systems [5]. We define two 
subclasses of sp-languages, the right and outermost sp-languages, and show that 
the rsp- and osp-languages accepted by 1-safe nets can be accepted by 1-safe GF 
and S-systems respectively. They are also characterized in terms of algebra, logic 
and syntactic expressions to obtain a Kleene-Myhill-Nerode-Biichi description. 
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1 Preliminaries: sp-Terms and Their Runs on Nets 

A (labelled) net is a tuple (P,T,F,£) where P is a set of places, T a (disjoint) 
set of transitions, £ : T ^ A a function labelling the transitions and F C (P x 
T) U (T X P) is the flow relation. 

For a place or transition x, its pre-set F~^{x) is conventionally denoted *x 
and its post-set F{x) is denoted x*, and their union is called the neighbourhood 
of X. A net is said to be unbranched if for all its places p, \*p\ < 1 and |p*| < 1. 

We will assume that F satisfies the condition that for each transition t, *t 
and t * are nonempty, and for each place p, either *p or p* is nonempty. A net 
is called acyclic if the relation F* is antisymmetric. 

A (labelled) net system is a tuple N = {P,T,F,£,Mq), where Mq : P — 1 N 
is an initial marking. The run of a net system is described by a “token game” 
leading from the initial marking to other markings [16]. Formally, a marking is 
a function M : P — i N. 

A net system is said to be 1-safe if for all markings M reachable from the 
initial marking and for all places p, we have M(p) < 1. Now markings can be 
interpreted as 0/1-vectors, or equivalently as subsets of places. We will therefore 
use M C P instead of M : P — i N to describe markings of 1-safe net systems. 

Our definition of a “run” of a net system is known in net theory as a “non- 
sequential process” . To “accept” finite runs of net systems as in automata theory, 
we define an accepting 1-safe net system to be a tuple Af = (N,F), where N is 
a 1-safe net system and T C pP is a set of final markings. 

Let Af = {P,T,F,£,Mo,F) be an accepting 1-safe net system and O = 
(P', T', F' ,£') an unbranched, acyclic 1-safe net. O is a (terminal) process of Af if 
it can be mapped onto the accepting system — formally, we require two functions 
7T : P' — >■ P and tt : T' — >■ T (the context will make clear which function is 
meant) satisfying the following properties: 

— for all t in T', 7r(*t) = *7r(t), 7r(t*) = 7r(t) * and f'(7r(t)) = £'{t) (transition 

neighbourhood and labelling respected); 

— tt{{p G P' \ *p = 0 }) = Mo and tt{{p G P' \ p* = 0 }) G F (initial and final 

markings respected). 

The derived tuple (T', (P')*n(P' x P'),P|’T') is a poset, and we will say that 
the net system M accepts this poset. The set of all posets Af accepts is called 
PL{Af), the poset language of Af. 

1.1 The sp-Runs of a Net System 

We now restrict attention to the subclass of labelled N-free posets, that is, those 
nonempty posets which satisfy 

Vic, X, y, z{w < y, w < z, x < z, w cox, y co z implies x < y) (1) 

Here a cob says that a and b are unordered. It has been shown [10,7] that these 
posets are described inductively using a set of series-parallel terms SP{A) over 
the alphabet A, given by the syntax u ::= a G A \ uiU 2 \ Ui\\u 2 , which we call 
sp-terms. (Note that there is no way of describing the empty poset.) 
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We use this result to inductively describe those runs of a net system which 
correspond to N-free posets. TheMi[w)M 2 notation standard in net theory is used. 
Of these, the ones from the initial marking to a final one form PI(JV) fl SP{A). 

We say Mi [a) Mo holds if for some a-labelled transition t such that *t C Mi , 
(Mi\*t)Uf = M2. 

The term u||w gives the parallel composition (disjoint union) of the posets u 
and V, leaving their elements unordered with respect to each other. Mi [m||u) M 2 
holds if each Mi can be split into disjoint components M“ and M“ such that 
M“ [u) M^ and M" [u) M^. 

The term uv stands for the poset formed by sequencing the posets u and v, 
that is, ordering all elements of u before those of v. We say that a marking M is 
connected if it cannot be decomposed into two markings Mi and M 2 which can 
independently enable transitions. Mi [uu) M 2 holds if there is an intermediate 
connected marking M such that Mi [u) M and M [u) M 2 . 

Without the connectedness condition on intermediate markings, a run on 
ab\\cd could also be seen as a run on (a||c)(6||d), and these two are distinct poset 
behaviours. 

We have implicitly used the fact that sequencing and parallel composition 
are associative and that parallel composition is also commutative. 

We do not know of any structural definition which will restrict nets to exactly 
those which have N-free behaviour. 

1.2 sp-Algebras and sp-Languages 

More generally, an sp-algebra is a set S equipped with the binary associative 
operations of sequential product and parallel composition, such that parallel 
composition is also commutative. SP{A) is the free sp-algebra over generators 
A, built from the letters of A by these operations. In this paper, our interest 
will be restricted to those poset languages which can be seen as sp-languages, 
subsets of 5'P(A) [13]. 

We define the width of a sp-term to be the maximum number of nested parallel 
operations occurring in it plus one. For instance, the term a(6||c(a||6)) has width 
three. Intuitively, width provides an upper bound on the number of processors 
required for a maximally parallel implementation of the poset corresponding to 
an sp-term. The width of an sp-language is the supremum of the widths of its 
terms. 

We will be interested in two subclasses of SP{A). 

The outermost parallel sp-terms OSP{A) are those where no parallel compo- 
sition is nested inside a sequential product. OSP{A) characterizes finite posets 
which are disjoint unions of chains. 

The right parallel sp-terms RSP{A) are those where a subterm of the form 
(ui||m 2 )u 3 is disallowed, that is, only the right multiplicand of a sequential prod- 
uct is allowed to be a parallel composition. RSP{A) characterizes finite join-free 
posets, that is, those posets where there do not exist distinct elements x, y, z 
such that X < z, y < z and x, y are unordered. 

Clearly A+ c OSP{A) c RSP{A) c SP{A). If an sp-language is a subset 
of RSP{A), we call it an rsp-language; osp-languages are similarly defined. 




Hierarchical Structure of 1-Safe Petri Nets 



177 



A zero element in an sp-algebra absorbs both operations. If it exists, it is 
unique and is written 0. We say an sp-algebra is right zero if it has a zero and 
for any parallel term a;i||a;2, right multiplying it by any 0:3 yields zero, that is, 
{xi\\x2)x^ = 0. An sp-algebra is outermost zero if it has a zero and multiplying 
any parallel term on the left or the right yields zero. 

As usual, a morphism between sp-algebras is a map preserving the two oper- 
ations. An sp-language L is recognized by an sp-algebra S if there is a morphism 
(f) : SP{A) — >■ S such that L = (f>~^<j>{L). If S is finite, we say L is recognizable. 

Proposition 1 Let L he a recognizable sp-language. Then L is an rsp-language 
(osp-language) if and only if it is recognized by an sp-algebra S such that S is 
right zero (outermost zero, respectively) and 0 is not in the image of L. 

1.3 Monadic Second-Order Logic 

We now turn to a different view of labelled posets, seeing them as models of a 
logical language MSO[<,A\. The atomic formulas of this logic are £{x) = a (for 
a € A), X < y and x = y. Apart from the usual first-order variables (such as x, 
y above) interpreted as elements of the poset under consideration, we work with 
monadic second-order variables, interpreted over subsets of poset elements. The 
formulas are closed under boolean operations, first-order and monadic second- 
order quantification. The definition of when the poset u satisfies the formula a 
{u ^ a) is standard. 

Any (MSO) formula a defines the language of all the posets which satisfy it, 
PL{a) = {m I u ^ a}. Call such a language (MSO-)definable. 

For instance, the description (1) can be written as a (first-order) formula 
(along with a formula specifying a nonempty poset), defining the poset language 
SP{A) of all N-free posets. Similarly, the languages RSP{A) and OSP{A) are 
definable using the formulas 

drsp = Vcc, y, z(x < z and y < z implies x < y or y < x) 

dosp = Srsp and Vx, y, z{x < y and x < z implies y < z or z < y) 

Hence, by conjuncting a formula a with that for (say) SP{A), we get the definable 
sp-languages. MSO-definability of sp-languages was studied by Kuske [12], who 
showed that the recognizable sp-languages coincide with the MSO-definable ones. 

Proposition 2 The recognizable rsp-languages (osp-languages) of hounded 
width equal the MSO-definable rsp-languages (osp-languages) of bounded width. 

2 Operations Corresponding to Programming Constructs 

Our basic syntax will be the series-rational expressions, which can be seen as an 
abstract representation of structured programs with cobegin-coend statements. 
These expressions will be interpreted as sp-languages. 

— Every a G A is a rational expression. If ri and V2 are rational expressions, 
then so are ri U r2, riC2, r(). 
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— Every rational expression is a series-rational expression. If Ci and 62 are 
series-rational expressions, then so are e\ U 62, 6162, and ei||e2. 

The rational expressions are exactly the same as the (regular) expressions for 
words, except that they are interpreted over posets. Adding parallel composition 
(cobegin-coend) gives the series-rational expressions. 

We again identify two subclasses of interest. An expression with parallel 
composition only at the outermost level is said to be outermost series-rational. 
This corresponds to one single cobegin-coend inside which we have a while 
program for each process. 

An expression where parallel composition is only allowed on the right hand 
side of a sequential product is said to be right series-rational. (More precisely, the 
right series-rational expresssions are defined by the syntax below. The outermost 
series-rational expressions can be similarly defined.) In terms of programming 
languages, these are structured programs with cobegin-coend such that all the 
coend’s occur at the end of the program. Processes can be “forked” to make 
the process structure grow, but only the end of the program terminates all the 
processes. 

r ::= a I ri U T2 I riT2 | r)*", e ::= r | rei | ei||e2 

Now, we associate with each expression an sp-language. L{a) is the singleton 
set consisting of the term a, L{ei U 62) = L{e\) U L{e2), L{ei62) = L{ei)L{e2), 
L{e^) = L(ei)+ (Kleene iteration) and L(ei||e2) = T(ei)||L(e2). The operations 
on the right hand side are performed pointwise on sp-terms. L~^ is defined as 
usual inductively, to give the concatenation of L any number of times. 

Finally, an sp-language over A is said to be series-rational if it is of the form 
L{e) for some series-rational expression e over A. We similarly define the out- 
ermost and right series-rational languages. The paper [ 13 ] shows that the series- 
rational languages are the recognizable sp-languages of bounded width. 

From the definitions, it follows that the outermost series-rational languages 
are subsets of OSP{A) and the right series-rational languages are subsets of 
RSP{A). Conversely, if a series-rational language is a subset of OSP{A)/RSP{A), 
we can find an outermost/right series-rational expression generating it. 

Proposition 3 The right (outermost) series-rational languages are the recog- 
nizable rsp- ( osp-)languages of hounded width. 

Putting Proposition 3 and Proposition 2 together gives us algebraic and 
logical characterizations for our expressions. In the rest of the paper, we tie 
these up with Petri nets. 



3 Net Constructions 

In this section, we will give constructions on nets which correspond to the ex- 
pression operations above. Many of these are based on earlier work [ 9 , 8 , 20 , 3 , 15 ], 
but ours are the first direct constructions we have seen for net systems with final 
markings. 
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We first recall a construction which does not affect the poset language ac- 
cepted by a net. The place complement of a net adds a copy of the set of 

places P of Af in such a way that a place contains a token iff the complementary 
places are empty [19]. The initial marking of the new net is (P \ U Mq 

and the final markings are (P \ U Mf for every Mf € T . 

We need a technical notion: the final width fwd{u) of an sp-term u is the 
width of its last sequential component. 

We call a net system Af = (P, T, P, £, Mq, T) behaved if it satisfies the follow- 
ing properties: 

— *Mo = 0; *(Mo *) = Mo (initial marking a source) 

— For every Mf in P, Mf * = 0; (*Mf) * = Mf (final marking a sink) 

— For every Mf in T such that Mq [t6) Mf, \Mf \ = fwd{u) (final markings 
separated by parallel width) 

— Distinct Mj,M‘j in T are disjoint from each other and from Mq 

— For every Mi [■u||w) M 2 , M can be divided into disjoint parts Nm-Afv such 
that Mi\Afu [ m ) M2\Nu and MifA/j, [w) M 2 (A/), (parallel sub-runs disjoint). 

Now we start constructing nets for expressions. The construction of a behaved 
net system for the expression a is trivial, we call it an atomic net system. 

We give constructions for the other operations. Note that we use -I- to indicate 
disjoint union. 

3.1 Union 



Let Ni = Fi, MQ,iF^,ii),i = 1,2 be 1-safe net systems. Then their sum 

is the net system Af = (P, T, P, Mq, P, £) where 

^ P = Pi + P2 + Pnew] Pnew = (Mg X Mg) 

- T = Ti + T2 + T„eu, = (Mgi) * U (M^) * 

- £ = £i\J £ 2 ^ {{t ^ £i(t)) I £ G Tnew n Ti, f = 1, 2} 

- P= Pi UP2U{((pi,P2) ; ^ Pnew ^ Pnew I (pi,t) G Pi or (p 2 ,t) G P 2 } 

U {{t,p) G Tnew X {Pi u P 2 ) I {t,p) G Pi or {t,p) G P 2 } 

- Mg = Pnew'-, T = Vd 

The new system adds to a copy of both A/) a fresh set of places for the marking 
Mg X Mg (which is the new initial marking, and a final marking if either Mg 
was a final marking) and a fresh copy of transitions enabled by each Mg. Figure 
1 shows an example. 

From the construction, we have for A/” that its initial marking is a source. If 
both the A/) are behaved, then the final markings of Al are sinks, and are disjoint 
from each other. The parallel final width and disjoint sub-run conditions also 
follow from the behavedness of the A/). 

Proposition 4 PL{Af) = PL{Afi) U PP(A/ 2 ). 
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1 c <=> 

Fig. 1. PL(lc) = PL(la) U PL{lb) 



3.2 Sequencing 

Let A/i = {Pi,Ti, Fi, = 1,2 be 1-safe net systems. Then their se- 

quential composition is the net system J\f = {P, T, F, Mq,F, £) where 

^ P = Pi -|- P2 -|- Pnew] Pnew = (-^/ ^ -^o) 

- T = Ti + T2 + P„e»; = U C(M)) U (M^) •) 

— t' = U .^2 U {(t — >■ £i{t)) I t G Tnew (~l Pi, 1=1,2} 

— P = Pi U P 2 U {(p, t) G Pi X Tnew I {Pi t) G Plj 

U{(t, (P1,P2)) G '^new ^ Pnew I (t^Pi) € Fi} 

^ { ((^^1 7 P2) 1 ^ Pnew ^ Pnew I G -F2} 

U {{t,p) G T„e™ X P 2 I (t,p) G P 2 } 

- Mo = M}; P = P2 

The new system uses a product of Mj: x Mq for each Mj: G P^, making 
copies of transitions coming in to Mj as well as leaving Mq. Mq is the initial 
marking, and P^ the final markings. This is illustrated in Figure 2. 

If both nets Afi are behaved, since the construction does not affect the initial 
marking of Afi or the final markings of A/2, the source, sink and final width 
properties are maintained. The parallel sub-run property is also maintained by 
the argument presented next. 

Proposition 5 If Afi is place complemented and both Afi are behaved, PL{Af) = 

PL{Ni)PL{N2). 
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Fig. 3. PL{3b) = PL{3a)+ 



U{{{pi,p2),t) G X I (p2,t) G Fi} 

U {{t,p) G X Pi I (t,p) G Fi} 

- Mo = M^; F = F1uF2 

As for sequencing, assume that Afi is behaved. Using the two copies maintains 
the source, sink and final width properties, and the argument for parallel sub- 
runs follows the one used above for sequencing. 

Proposition 6 If Afi is place complemented and behaved, then PL{M) = 
PL{Mi) + . 

3.4 Parallel Composition 

The parallel composition of two 1-safe net systems Mi = {Pi ,Ti,Fi, Mg ,ii),i = 
1, 2 is the simple disjoint union M = (F, T, F, Mg, F, C) where 

- P = Pi -H Fa; T = Ti -k T 2 ; ^ U £ 2 ; F = Fi U F 2 

- Mg = Mgi U Mg2; T = {M) U M| | M) G M] G F^} 

Clearly if both nets Mi are behaved, so is M, the construction being designed 
to preserve the parallel sub-run and final width properties. 

Proposition 7 PL{M) = PF(A/i)||FF(A/ 2 ). 
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3.5 Net Subclasses 

Definition. The subclass of net systems built up from the atomic systems using 
the constructions for union, sequencing, iteration and parallel composition is 
called (1-safe) SR-systems. 

Theorem 8 Given a series-rational expression e, one can construct an SR- 
system Af such that PL{Af) = L(e) C SP{A). 

Proof. By induction, using the Proposition corresponding to each operation. 

All posets accepted by an SR-system are N-free. By constructing the reachability 
graph of a net system and running through all four-tuples of transitions, the 
N-free property can be algorithmically checked (using polynomial space). We 
conjecture that there is a polynomial time algorithm to check that a 1-safe net 
system is SR. 

For net systems corresponding to the outermost /right series-rational expres- 
sions, we look at some well-known subclasses of net systems. (These are, in fact, 
the motivation for defining these expressions.) 

An S-system [11] is a net system where neither forking (jt *| > 1) nor joining 
(|*t| > 1) transitions are allowed. In other words, for each transition, its pre-set 
and post-set are both singletons. The posets in the language accepted by an S- 
system satisfy the property that they are disjoint unions of chains, that is, they 
are outermost parallel. 

A communication-free ( CF) system [5] is one where joining transitions are 
not allowed. The posets accepted by a CF-system are right parallel. 

The table in Figure 4 summarizes the closure properties of our net subclasses. 
Counter-examples are also listed. 



Closure under 


Sequencing 


Iteration 


Union 


Parallel 


1-safe SR-systems 


Yes 


Yes 


Yes 


Yes 


1-safe CF-systems 


No, (a||&)c 


No, (a||fe) + 


No, (a||&) U c 


Yes 


1-safe S-systems 


No, a(6||c) 


No, (a||fe)+ 


No, (a||&) U c 


Yes 



Fig. 4. Constructions 

Theorem 9 Given a right series-rational expression e, one can construct a 1- 
safe GF-system N such that PL{Af) = L{e) C RSP{A). 



Theorem 10 Given an outermost series-rational expression e, one can con- 
struct a 1-safe S-system Af such that PL{Af) = L{e) C OSP{A). 

4 From Nets to Expressions 

All the formal language machinery is now in place, and we can start the job of 
extracting syntax from systems. We crucially use the fact that for 1-safe systems 
the width of the sp-terms accepted is always bounded. 
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4.1 1-Safe Nets 

Lemma 11 Given a 1-safe system Af, there is a series-rational expression for 
PL{Af)nSP{A). 



Proof. Our proof is by induction on the width of PL{Af) nSP{A). We construct 
expressions for the following sets: 

Pm^n sp-terms of width k which are runs of Af from the marking 

M to the marking N, such that the term is a parallel composition (or an atomic 
action) . 

is the set of sp-terms of width k which are runs of Af from the marking 
M to the marking N, such that the term is a sequential product (or an atomic 
action) . 

We will let U 

We now define the sets of parallel terms. i® the set of all a such that 

a transition labelled a takes M to N. For A: > 1, we have: 



r>k 
^ M- 






u 



l{ 






Mx^Ni\\-^M2^N2’ 



31 + 32 =^ 

Ml + M2 = M , iVi + iV2 = iV 



The construction of is from P^^n follows traditional automata- 

theoretic lines. We will need to work with a specified set of connected inter- 
mediate markings R. We define by induction on its size |i?| the set of 

terms from which only use intermediate markings from the set R. We let 

j-k,R pfc I I qk,R 

Then i® nothing but Inductively, for r ^ R, we have 

qk,RU{r} qk.R , . II ril.R ( T 32 ,R\* T AP 

Jl .32 .32<k 

ji=k or j2=k or J 3 =fc 

This is a generalization of the usual McNaughton-Yamada expression used for 
word languages. The conditions on the union ensure that the width of the term 
added is k. 

By taking the entire set of such intermediate markings, we have the desired 
series-rational expression. 



Theorem 12 The following elasses of sp-languages are equivalent: 

(1) The series-rational languages. 

(2) The poset languages aecepted by 1-safe SR-systems. 

(3) The sp-languages accepted by Tsafe systems. 

(4) The recognizable sp-languages of bounded width. 

(5) The MSO-definable sp-languages of bounded width. 
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Proof. (l=i>2=i>3): Theorem 8. 

(3=i>l): Let K be the poset language accepted by a 1-safe system. Hence 
L = KnSP{A) is an sp-language. By the lemma above, there is a series-rational 
expression for it. 

(l<i=^4): From [13], we know that the series-rational sp-languages are the 
recognizable sp-languages of bounded width. (The paper also gives a purely 
algebraic condition, expressed in terms of nilpotency of the parallel product, 
which corresponds to bounded width.) 

(4<i=^5): From [12], we know that these coincide with the MSO-definable 
sp-languages of bounded width. 

The paper [13] also characterizes these languages as those accepted by cer- 
tain “branching automata” . It would be interesting to find a direct translation 
between these and 1-safe nets preserving the accepted sp-language. From the 
results of [13], we also get: 

Corollary 13 The languages accepted by Tsafe SR-systems are closed under 
intersection, direct and inverse morphism. 

4.2 Subclasses 

Theorem 14 The following classes of rsp-languages are equivalent, and closed 
under intersection. 

(1) The right series-rational languages. 

(2) The poset languages accepted by Tsafe CF-systems. 

(3) The rsp-languages accepted by Tsafe systems. 

(4) The recognizable rsp-languages of bounded width. 

(5) The MSO-definable rsp-languages of bounded width. 

Proof. (1=J>2=^3): Theorem 9. 

(3=^>1): Let K be the poset language accepted by a 1-safe system. Then it 
accepts the rsp-language L = K C\ RSP{A). Now K fl SP{A) is series-rational 
and RSP{A) is MSO-definable, therefore series-rational by Theorem 12, so their 
intersection L is series-rational by Corollary 13. Since L C RSP{A), the gener- 
ating expression for it must be right series-rational. 

(1<^=^4<^=^5): Proposition 3 and Proposition 2. The closure under intersec- 
tion follows. 

By simple modifications of the preceding arguments, we get: 

Theorem 15 The following classes of osp-languages are equivalent, and closed 
under intersection. 

(1) The outermost series-rational languages. 

(2) The poset languages accepted by Tsafe S-systems. 

(3) The osp-languages accepted by Tsafe systems. 

(4) The recognizable osp-languages of bounded width. 

(5) The MSO-definable osp-languages of bounded width. 
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Kuske 




Fig. 5. Relationships 



Discussion. Figure 5 summarizes the results of this paper. We believe our 
characterizations also work for k-sa£e nets, using the results of [2]. Extending 
the work to general (unsafe) nets is more challenging. 

Petri nets are typically meant to model infinite behaviour. A theory of lu- 
rational expressions for N-free poset languages has been developed by Kuske [12], 
and a similar development can be done in our setup without too much difficulty 
[18]. 

Nets have a richer interaction structure than can be described by sp-terms. 
In our setup, the obvious way to move towards the full class of 1-safe Petri nets 
would be to include in our terms an operation corresponding to synchronization 
(as has been done in PBC [1].) However, carrying all the equivalences forward 
seems to be difficult. 
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Abstract. In this paper, we introduce the crypto-loc calculus, a calculus 
for modelling secure mobile computations that combine the concepts 
of locations, cryptography, and code mobility. All these concepts exist 
in mobile systems, for example, Java applets run within sandboxes or 
downloaded under an SSL connection. We use observational equivalence 
of processes as a powerful means of defining security properties, and 
characterize observational equivalence in terms of a labelled bisimilarity 
relation, which makes its proof much easier. 



1 Introduction 

The issue of security has become central in mobile computing systems as a result 
of the increasing technological advances enjoyed in recent years and the appear- 
ance of new security threats associated with such advances. In process algebraic 
models of mobility, the modelling of the various actions that the system can 
perform has been captured in terms of a few important concepts. Among these 
are the concepts of location, cryptography, and code mobility. In most occasions, 
these concepts have been studied separately. Here, we combine the three con- 
cepts into one model with the goal of capturing the security and mobility features 
that already exist in platforms like Java and .NET, in a simple and elegant cal- 
culus. This combination makes it possible to analyze the security properties of 
cryptographic protocols also involving mobility, for instance. 

Locations have significant security effects; they are name-protected areas of 
computation, which provide a handy isolation mechanism often useful in the 
modelling of concepts like firewalls and sandboxes. Communications between lo- 
cations are generally restricted to either communications local to one location 
(ambients [12]), or communications with locations immediately inside or outside 
a location (boxed ambients [10]). The former implies using an opening mecha- 
nism to destroy ambient boundaries, in order to allow communication between 
ambients. This weakens the isolation power of locations considerably and has 
security drawbacks already recognized in previous work [10], so we prefer the 
latter choice. 

The mobility of locations may be classified into subjective and objective. Sub- 
jective mobility (as in, e.g., the ambients calculus [12]) implies the presence of 
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language primitives that can be used by a location as commands to directly 
control its movement. On the other hand, objective mobility (as in, e.g., the 
seal calculus [20]) dictates that locations be controlled by their context, which 
has primitives expressive enough to be able to send and receive locations. Fur- 
thermore, one can distinguish weak mobility, in which the received code always 
restarts its execution in the initial state, strong mobility, where a thread is sent 
along with its execution state, and full mobility, where a full program is sent 
along with its state [8]. In our model, certain kinds of data represent code. These 
data can be manipulated like any other piece of data: they can be sent and re- 
ceived (which provides weak objective mobility of any code, not just locations), 
but also manipulated by cryptographic primitives. In contrast to ordinary data, 
they can also be executed. This modelling of mobility corresponds to what hap- 
pens in systems such as Java and .NET, and is also easier to implement than 
strong or full mobility. 

Our representation of cryptography is borrowed from the applied pi calcu- 
lus [3], in which cryptographic primitives are defined as functions satisfying an 
equational theory. This provides a generic treatment of primitives, which pro- 
motes a uniform understanding of the underlying theory and yields a powerful 
technique for specifying cryptographic systems. 

As often in process calculi, some of these features can be encoded. Location- 
based languages can encode cryptographic operations, including symmetric and 
asymmetric encryption [12,20]. (That would probably be more difficult for prim- 
itives such as xor.) Strong mobility can be encoded from weak mobility [8]. The 
execution of certain kinds of data can be encoded by an interpreter (such an 
interpreter has already been defined for the pi calculus [14]), and strong mo- 
bility could be encoded by an improved version of such an interpreter. Even if 
encodings exist, we choose to provide as primitives in our calculus the features 
that exist in Java and .NET, i.e. cryptography, sandboxes modelled by locations, 
and weak code mobility. It is obviously possible to encode other features in our 
calculus, such as strong mobility, much like one could encode them in Java. 

The combination of locations, cryptography, and weak code mobility enables 
us to model scenarios, such as Java applets, in which code is sent on the net- 
work after a cryptographic treatment (signed applets) or inside a cryptographic 
protocol (applets sent over SSL, for instance). Based on the amount of trust 
established for an applet, it is then possible to run it inside or outside a sandbox 
(location). Similarly, we could also model applets that execute a cryptographic 
protocol, in which case the transmitted code uses cryptographic primitives. We 
develop a general technique based on observational equivalence to prove security 
properties, and use it to show some of the main properties of such examples. 

Outline. In Sect. 2, we present the syntax and operational semantics of the cal- 
culus. In Sect. 3, we introduce and relate observational equivalence and labelled 
bisimilarity. In Sect. 4, we introduce a few examples of systems that could benefit 
from the calculus and discuss their security properties. In Sect. 5, we compare 
with related work and finally, in Sect. 6, we conclude. 
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2 Process Calculus 

The syntax of the crypto- loc calculus is defined in Fig. 1. In this calculus, terms 
M, N represent passive data such as messages. These are either atomic terms, 
like names a, b, c and variables x, y, z, or complex terms that result from the 
application of functions / defined by a signature S. Names represent atomic 
data, such as keys, while variables will later be substituted by some possibly 
complex message. The signature U consists of a finite set of function symbols, 
with an associated arity. It is equipped with an equational theory, that is, a con- 
gruence relation on terms closed under substitutions of terms for free variables 
and names. We denote hy S \~ M = N the equality of M and N modulo the 
equational theory, and we assume that there exist two different terms modulo 
this equational theory. 

The signature S that we adopt in this paper mainly concentrates on the cryp- 
tographic functions, such as symmetric and asymmetric encryption/decryption, 
digital signatures/ verification etc. For instance, shared-key cryptography can be 
defined by two functions encrypt for encryption and decrypt for decryption, 
with the equation decrypt(encrypt(x, y), y) = x. Data structures can also be 
encoded by including functions for tuple creation and projections. Sometimes we 
abbreviate tuples (Mi, . . . ,M„) as the vector M. While, in this paper, our ex- 
amples use only simple cryptographic primitives, the technique can also model 
complex cases such as xor and Diffie-Hellman key agreements [3]. The term 
pack(M, Xx.P) transforms a process P and a tuple M into data that can be 
treated as any other passive term. This removes the need for special input/output 
actions for processes. We postpone the detailed explanation of this term to after 
the explanation of processes. 

The syntax of processes P, Q is described as follows. A null process 0 is an 
inactive process unable to evolve any further. The process P \ Q denotes the 
parallel composition of two process P and Q. A replicated process IP spawns an 
unbounded number of copies of P. Restriction (ya)P localizes the name a within 
the scope of P. A location M[P] embodies the process P. The process P is said 
to reside at M. As in boxed ambients [10], an input action M^{Xx).P has three 
types: local-input M*{Xx).P, up-input M^{Xx).P and down-input {Xx).P. 
A local input can only synchronize with output actions that are performed by 
processes local to the current location. Up-input can only receive messages from 
processes resident in the environment location enclosing the current location. 
A down-input {Xx).P can only receive messages from the sublocation N 
enclosed by the current location. Similarly, output actions ivf' {M').P are divided 
into local-, up- and down-outputs. For instance, in the process c[a^ {Xy).P] \ 
dX{M).Q, the down-output dX{M) can send its message to the up-input a}{Xy) 
inside location c. The terms that represent channels in input/output actions, 
their 77 parameter, and the names of locations must reduce to names at runtime. 
Otherwise, the process blocks. The exec(M) process executes M when M is a 
packed process. Otherwise, it blocks. Finally, a conditional process compares two 
terms, M and N, and branches to either For Q, depending on whether P\- M = N 
holds true or not. As usual, we may omit an efee clause when it consists of 0. 
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M,N ::= 
x,y,z 
a, h, c, k 

pack(M, Xx.P) 



terms 

variable 

name 

function application 

processes (with fn{P) = %,fv{P) C {a:}) 



M 

t 



communication target 
down 
up 
local 



0 

P\Q 

\P 

{ua)P 

M^{\x).P 

M\M').P 

M\P] 

exec(M) 

if M = N then P else Q 



processes 

null 

parallel composition 

replication 

restriction 

input 

output 

location 

execute packed process 
conditional 



Fig. 1. Syntax of the process calculus 



The substitution that replaces M for x is denoted by {M/x}. We extend 
this to tuples {M/x}. We denote by Pa (resp. Ma) the application of the 
substitution a to process P (resp. term M). We define let x = M in P as 
syntactic sugar for P{M/x}. We denote by {x} the set of variables in the tuple 
X. We denote by fn the free names and by fv the free variables of a process or a 
term, defined as usual. In pack(M, Xx.P), the variables x are bound. A process 
or a term is closed when it contains no free variable. It may have free names. 

Let us now explain packed processes. Intuitively, in pack(M, Xx.P), P repre- 
sents the code of the process and the tuple M represents its data. When executing 
such a process, the tuple Mis substituted for the variables xin P, and the result- 
ing process is executed. We include a function getdata G S used (by malicious 
processes) to extract the data M, defined by getdata(pack(M, Ar.P)) = M. 

The separation between data and code is important for security consider- 
ations, since an adversary can extract all data carried by a process. So, for 
example, the process: 

pack((s, k, a),X{x, y, z) .z'^ {encrypt{x , y))) 
reveals s, k, a when it is sent in clear over the network, whereas the process: 
pack( (encrypt (s, k),a),X{x, z).z^{x)) 

reveals only encrypt (s, k) and a, but not s when the adversary does not have k. 
Then, these two processes must be represented differently, even if they perform 
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the same actions when they are executed. The separation of code and data is 
enforced by the conditions fn{P) = 0 and fv{P) C {£} that prevent the pres- 
ence of data in P. The equational theory is such that S h pack(M, Aai.P) = 
pack(M', Xx'.P') if and only if S \- M = M' and P' {x/x'} = P (with fn{P) = 
fn{P') = 0, fv{P) C {x}, and fv(P') C {x'}), where = is the structural equiva- 
lence defined below. 

The adversary can also observe the code of packed processes, by testing 
equality with processes it builds. Hence, the process: 

c^{Xx).if X = pack(getdata(x), Aj/.P) then . . . else . . . 

tests whether the code of x (received on c) is structurally equivalent to P (since 
the equality is defined as structural equivalence for processes) . We could option- 
ally add other functions to manipulate and transform the code of packed pro- 
cesses, but getdata already gives enough power to the adversary. Indeed, the 
adversary can build any process, so it can rebuild the code of packed processes 
without having a primitive to extract it. (Note, however, that scoping constraints 
introduced by the restriction prevent the adversary from using private data of 
processes, when it does not learn them by listening to communications.) 

The formal semantics of the crypto-loc calculus is given in Fig. 2. Note that 
all rules apply only to closed processes and that in (11), M and M' must be 
closed. Processes are considered equal modulo renaming of bound names and 
variables. Processes are prepared syntactically for communications by applying 
the structural equivalence relation =, which is defined as the least equivalence 
satisfying the first set of rules of Fig. 2. The second set of the rules of Fig. 2 intro- 
duces the reduction relation — >■. The first three rules (Red Local), (Red Down), 
and (Red Up) deal with local, downward, and upward communications with re- 
spect to the process performing the output action. Rule (Red Exec) deals with 
the execution of packed processes. Other rules are standard. 

As an example, let us consider the following process: 

(izfc)(c*(pack((d, encrypt(m, k)),X{xd, Xe).a;/(Az).xll*(decrypt(xe, z)))) 

I c*(Xy).b[exec(y) | d*(Axm)-0] | d\fc)) 

This process first creates a new key k, shared between the processes that execute 
in parallel. The first element of the parallel composition outputs a packed process 
on the public channel c; the second one receives it, and executes it inside a 
location b. After these steps, the process becomes 

{h'k){0 I 5[d^(Az).(i*(decrypt(encrypt(TO, A:), 2 )) | d*(Axm).0] | (k)) 

The process that comes from the packed process waits for a message on channel 
d from outside location b. So it can receive the key k sent by the last element 
of the parallel composition. After receiving this key, it decrypts the message 
encrypt(m, k) with it, and outputs the result m on channel d locally inside 
location b. The message m is then received by d*{Xxm)-f>- 




structural equivalence P = P'x 

P\Q = P (1) 

P\Q = Q\P (2) 

{P\Q)\R=P\{Q\R) (3) 

P = Q ^ P\R=Q\R (4) 
P = Q ^ {ua)P = {va)Q (5) 
P = Q => M[P] = M[Q] (6) 

Reduction P ^ P': 



a*{Xy).P' 1 a*{M).Q' ^ P'{M/y} \ Q' 




(Red Local) 


c[a\Xy).P' 1 R] \ a^{M).Q' ^ c[P'{M/y} \ R] \ 


Q' 


(Red Down) 


a^Xy).P' 1 c[a\M).Q' \ R] ^ P'{M/y} \ c[Q' \ 


R] 


(Red Up) 


exec(pack(M, Aai.P)) — >■ P{M/x} 




(Red Exec) 


if M = M then P else Q ^ P 




(Red Then) 


if M = N then P else Q — >■ Q ii E \/ M = N 




(Red Else) 


P^Q^P\R^Q\R 




(Red Par) 


P ^ Q ^ {va)P — >• {va)Q 




(Red Res) 


P ^ Q ^ M[P] M[Q] 




(Red Loc) 


P' = P, P ^ Q, Q = Q' ^ P' ^ Q' 




(Red =) 
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\P = P\\P (7) 

{va\){va2)P = {va2)(vai)P ( 8 ) 

iya){P \ Q) = P \ {va)Q ii a i fn{P) (9) 
(va)M[P] = M\{va)P] if a ^ fn{M) (10) 

P{M/x} = P{M' /x} M = M' (11) 



Fig. 2. Structural equivalence and reduction relations 



3 Process Equivalences 

Observational equivalence is a powerful means for defining security properties of 
processes and for reasoning about them. In this section, we define observational 
equivalence and show that a labelled bisimilarity relation can be used to provide 
simpler proofs of observational equivalence. Thus, we extend the results proved 
in [3] for the applied pi calculus to locations and mobility. 

Intuitively, two processes are observationally equivalent when an adversary 
cannot distinguish them. The adversary is formalized as any evaluation context, 
defined as follows: An evaluation context C is a context formed with the hole {}, 
restrictions {va)C, parallel compositions P \ C, and locations M[C]. Intuitively, 
the adversary can listen to messages sent on the network, compute its own mes- 
sages, and send them. It can also insert the considered process inside locations. 
The application of context C to process P is denoted by C{P}. An evaluation 
context C closes P when C{P} is closed. We say that a process P emits on 
channel a, and we write Pl|a, when P(— U =)* (nbi) . . . {vbn){a* {M) .P \ Q) 
with a ^ {bi , . . . ,6„}. 
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Definition 1. Observational equivalence « is the largest symmetric relation TZ 
between closed processes such that PTZQ implies: 

(1) ifP]}-a, thenQ]}-a; 

(2) if P — >•* P' then there exists Q' such that Q — >■* Q' and P'TZQ' ; 

(3) C{P}TZC{Q} for all closing evaluation contexts C. 

Observational equivalence can be used to define security properties of processes, 
by showing that the process is observationally equivalent to an ideal process 
representing the security specification of the system. It can also be used to define 
secrecy based on the notion of non-interference: 

Definition 2. A process P preserves the secrecy of fv{P) if and only if for all 
a and a' substitutions of domain fv{P), Pa « Pa' . 

This definition intuitively expresses that the adversary obtains no information on 
the value of the variables in fv{P), since processes using different values for these 
variables are observationally equivalent, i.e. indistinguishable by the adversary. 
This notion of secrecy makes it possible to detect implicit flows, for instance 
when the adversary can test whether a secret is equal to some value it has. It is 
stronger than the perhaps more frequent reachability notion, saying that a value 
is secret when the adversary cannot reconstruct it. We refer the reader to [1] for 
a more detailed discussion of the definition of secrecy. 

Proving observational equivalence is difficult, because we have to prove equiv- 
alence for all possible contexts (point (3) of the definition). We define another 
process equivalence, namely labelled bisimilarity, that avoids the universal quan- 
tification on contexts. We show that labelled bisimilarity equals observational 
equivalence. Such results are frequent in process algebras, but they are also im- 
portant: they make it possible to replace a proof of observational equivalence 
with a much simpler proof of labelled bisimilarity. 

The definition of labelled bisimilarity proceeds in several steps. First, we de- 
fine the notion of agents. Agents represent the current state of the process as well 
as the current knowledge of the adversary. An agent A is of the form {nd){a, P), 
where a is a set of names, ct is a substitution from variables to terms, and P is 
a process, such that dom{a) fl (fv{P) U Uxfv{xa)) = 0. In the agent {i^d){a, P), 
P represents the current state of the process and a the current knowledge of 
the adversary, that is, the adversary knows all terms in the image of a, and it 
can designate them by using the variables in the domain of a. The restriction 
(vd) indicates that the adversary does not have a priori knowledge of the names 
in d. For example, the agent (vk){{encrypt{{a,b) , k) /x} , P) represents that the 
adversary has the term M = encrypt ((a, b),k), but not the key k, so it cannot 
see that M is an encryption, but it can build {x, a), that is, (M, a), for instance. 
If it later learns k, with an agent such as {i'k){{encrypt{{a,b),k)/x,k/y},P'), 
then it can decrypt M by computing decrypt(cc, y) which is equal to (a, b) tak- 
ing into account the values of x and y. We define dom{{vd){a, P)) = dom{a). 
An agent is closed when the image of a and the process P do not contain free 
variables. A process is also an agent, by taking d and dom{a) empty. 
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Agents are considered equal modulo renaming of bound names and variables. 
The structural equivalence and reduction of agents extend the ones of processes. 
More precisely, the structural equivalence of agents is defined as the smallest 
equivalence relation closed under: 



{h'a){a, P) = {va){a' , P) if Vx, S \- xa = xa' 
{vb, a)(a, P) = {vb){a, {va)P) if Vx, a ^ fn{xa) 
P = P' ^{d){a,P) = {d){a,P') 



The reduction of agents is defined hy A ^ B when A = {i/b){a, P), P ^ Q, and 
(i^b){a, Q) = B. 

The labelled operational semantics represents reductions of processes that 
occur in interaction with an unspecified context. It is defined by the rules of 
Fig. 3. The label a of the reduction indicates the nature of the interaction with 
this context: it can be of the form m’^{x) (output of a term on channel M, 
the term is received in variable x by the context), (input of a term 

M' on channel M), M"[M^(x)], or M''[M^{\M')] (output and input through a 
location) . The rules defining the labelled reduction A A' apply only when A 
is closed, and then A' is closed. 

The rule (In) means that the process aP{Xx).P can receive a term M from 
the context, then it reduces to P{M/x}. The rule (Out) means that the process 
a^{M).P can output the term M to the context, which receives it in some 
variable x. The reduced process is P, and the context now has knowledge of the 
term M, which is represented by the substitution {M/x}. The rules (Loc In) 
and (Loc Out) handle inputs and outputs inside a location. Only inputs and 
outputs to the outside of the location (using f: a^(Ax) and a^(M)) are visible 
to the context, so this is the only case considered in these rules. Intuitively, in 
these rules, c[P] reduces to c[A\, but c[A\ is outside the syntax of agents, so we 
replace it with c[] o A which plays the same role but yields a well-formed agent. 
A^ Q, {i'o)oA, and aoA are used for the same reason in the following rules. The 
rules (Parallel), (Restriction), and (Agent) handle reductions under evaluation 
contexts. In the rule (Restriction), note that when a £ fn{a), the context needs 
to know a to perform the reduction P A, so it cannot perform this reduction 
on the process {va)P. The rule (Agent) is complex. It can be decomposed into 
three steps: First, from P A, we can infer (cr, P) ao A, provided that the 
domain of a does not overlap with the domain of the substitution of A, that 
is, with the bound variables of a. Second, we can replace the label a with a' , 
provided that B F aa = a'u, that is, a and a' have the same value given the 
current knowledge of the adversary. Finally, we add the restriction {va), in a 
way similar to the rule (Restriction). The rule (Struct) allows using structural 
equivalence to transform agents before reducing them. 

We now define the static equivalence of agents. Two agents A and B are 
statically equivalent when the adversary cannot distinguish the terms given by 
the substitutions of these agents (without executing the processes associated 
with each of A and B) . 
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(Aa:).P"* (M closed, 77 is *, f, or a name) 



P (r; is ★, t, or a name) 



c[P]"‘“^"")’c[]oA 
P"^^A 
c[PJ — >■ c[J o A 
P-^A 
P I Q^A^Q 
P A a ^ fn{a) 

{ya)P -^{ua) o A 
P A bv{a) Cl dom{a) = 0 E\~aa = a'a anfn{a') = 0 
(izd)((j, P) o a o A 

A = B B^B' B' = A' 

A^A' 



(In) 
(Out) 

(Loc In) 

(Loc Out) 
(Parallel) 
(Restriction) 
(Agent) 
(Struct) 



with 



(iza)(cr, P)<j>Q = (z^o)((T, P I Qa) if fn{Q) n a = 0 
(ub) o (vd){a, P) = {i^d, b){a, P) if 6 ^ a 
a' o (i/d){a, P) = {ud){a U a' a, P) if {ud){a, P) is closed, 
for all x,fn(xa') C\d — 0, and dom{a) n dom{a') = 0 
c[] o {i/d){a, P) = {vd){a, c[P]) if c ^ a 

fv{M''{x)) = fv{M) U fv{ri), fviM'^ {XM')) = fv{M)VJ fv{rf)VJ fv{M'), 
fv{M”\a\) = fv{M") U/r;(a) and similarly for the free names fn. 
bv{M^{x)) = {x}, bv{M'^ {\M')) = 0, bv{M"[a]) = hv{a) 



Fig. 3. Labelled semantics 



Definition 3. We say that {M = N)A if and only if there exist a substitution 
a, a process P, and names d such that A = (j/d)(cr, P), S h Ma = No, and 
d n (/n(M) \J fn{N)) = 0. 

IFe say that isname(M, A) is true if and only if there exist a substitution 
a, a process P, and names d, b such that A = {vd){a,P), S h Ma = b, and 
d n/n(M) = 0. 

Now, two agents A and B are statically equivalent (Ak^^ B) when dom{A) = 
dom{B), and for all terms M and N, {M = N)A if and only if {M = N)B, and 
isname(M, A) if and only z/isname(M, P). 

The condition “(M = N)A if and only if (M = Af)P” takes into account 
that the adversary can test equality between terms. The property isname(M, A) 
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means that M is equal to a name, taking into account the values of variables 
given by the substitution of A. The condition “isname(M, A) if and only if 
isname(M, B)” models the adversary’s ability to check whether M is a name, by 
using it as a channel: the process M* (Xx) .a* (b) \ M (b) emits on channel a if and 
only if M is equal to a name. Another possibility is to authorize communications 
when channels and locations are any terms instead of just names. With that 
semantics, the condition on isname(M, A) would be removed. 

Finally, we can define the labelled bisimilarity: 

Definition 4. Labelled bisimilarity (~\) is the largest symmetric relation TZ on 
closed agents such that AtzB implies: 

(1) A B; 

(2) if A ^ A' then there exists B' such that B — >■* B' and A' TZB' ; 

(3) if A-^A', then there exists B' such that B — >■* _>.* B' and A'TZB'. 



Theorem 1. Labelled bisimilarity equals observational equivalence: P Q if 
and only if P ps Q. 

Note that, in higher-order process calculi, the characterization of observa- 
tional equivalence as labelled bisimilarity is often very difficult [18], because, 
when the adversary receives a process, the only observation it can do is to exe- 
cute that process. In our calculus, the adversary can extract data and compare 
the code of processes (the existence of the function getdata is fundamental for 
that, see Sect. 2), much like it can for any other piece of data. Executing the 
process does not give more information to the adversary than these operations. 
Hence, being able to send processes does not complicate the labelled bisimilarity. 
In contrast, the presence of locations complicates the proof of Th. 1 considerably. 



4 Examples 

In this section, we present three examples illustrating the features of our calculus. 



4.1 Firewalls 

Firewalls exist as gateways through which communications with external net- 
works are controlled (or restricted) . The concept of a firewall corresponds directly 
to the concept of a location. Processes within a firewall can only communicate 
with the external context when the firewall allows it. Such communications re- 
quire a prior knowledge of the address (name) of the firewall. 

A well-known property that characterizes perfectly isolating firewalls is ex- 
pressed by the perfect firewall equation [12]: 

Proposition 1. Let P be a closed process. Then {vc)c[P] ~ 0. 
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This result states that the firewall (j^c)c[{}] is perfectly isolating because its 
internal behaviour is non-observable due to the hiding effect of (I'c). (This can be 
proved using labelled bisimilarity and noticing that (yc)c[P] and all its reductions 
by the standard reduction relation do not reduce by the labelled reduction.) 
When P has free variables, it is easy to show, using the above proposition, that 
{vc)c[P] preserves the secrecy of all its free variables, since {i^c)c[Pa] « 0 « 
{vc)c[Pa'] for all closed substitutions cr and a' of domain fv{P). 

In practice, a firewall often leaves some channels open for communications 
by extending the scope of the restriction (vc): 

i? =* {h'c){c[P] I !a*(Ax).a“(a;) | \b'^{Xx).b*{x)) \ Q 

With the two replicated processes aware of the name c, messages may be for- 
warded over channel a from Q to P, and from P to Q over channel b. Other 
communications are forbidden by the firewall. Hence, for example: 

Proposition 2. Let P = b (y) \ d (z). Then R{M/y} preserves the secrecy of 
z, and R{M/z} does not preserve the secrecy ofy. 

Indeed, y is sent to the outside of the firewall on channel b. Another form of 
firewalls is built by using nested locations: 

Proposition 3. c[c'[P]] « 0. 

Two nested locations prevent communications because communications are pos- 
sible only between locations immediately inside one another. We can allow some 
communications by extending the scope of the location c, to include some relay 
processes. For example, we can model a computer as a location c with processes 
running inside it as locations ci , . . . , c„ by the process: 

c[ci[Pi] I ... I c„[P„] I P] 

The process P models the operating system. Then distinct processes can commu- 
nicate with the network outside the computer and with each other only through 
the operating system P acting as the intermediary. 

We can also model a Local Area Network (LAN) as a location c, containing 
several locations Ci, . . . , c„, each representing a computer, by the same process. 
The process P the models the routing of messages inside the LAN and the 
communications with the outside. (Realistically, P should simply forward all 
communications from a location inside the LAN to another location inside the 
LAN, with only some communications forwarded to the outside.) 

4.2 Applets 

The pack and exec primitives for packing processes and extracting them from 
messages allow for the modelling of applets. For example, consider the following 
sequence of messages exchanged between a client C and a server S\ 

C ^ S : “Send me your applet” 

S ^ C : {pack(M, Xy.P)}sks 
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where digital signatures, used in the second message, require the function sign 
to sign a message, checksign to check it, extract to extract the message from 
a signed message, and pk to create a public key from its secret part. These are 
defined by the equations: 

checksign(sign(M, sfc), pk(s/c)) = True 

checksign(M', pk(s/c)) = False if M' yf sign(M, sk) for all M 
extract (sign(M, s/c)) = M 

The system can then be represented by the following processes: 

R =* {vsks)let pks = pk(sfcs) in {c*{pks) | C | S') 

C =* c*(sndapp).c*(Ax).z/ checksign(a;,pfcs') = True then exec(extract(a:)) 

S =* lc*{Xx).if X = sndapp then c*(sign(pack(M, Ay.P), sfcs)) 

The process R first creates a new secret key for the server sks, then computes 
the corresponding public key pks, and publishes this key by c*{pks), so that the 
adversary can have it. Finally, it executes the client and server processes (C and 
S) according to the message sequence above. 

The client is assured of the origin of the applet when it succeeds in verifying 
the signature using checksigu (assuming the public key pks is validly bound to 
the server S at the time when checksigu is called). It then executes the applet. 
Otherwise, if the applet is not signed, or it is signed with a key different from 
sks, then the client terminates. 

One may arrive at the following result, which shows that the only applet that 
can be executed by R is pack(M, Xy.P): 

Propositiou 4. Let R' be obtained from R by substituting “if extract(x) = 
pack(M, Ay.P) then P{M/y}” for “exec(extract(a:)) ” in C. Then R~ R' . 

The key lemma to prove this result is to show that the only packed process that 
can be the parameter of exec is the one signed by the server. 

Other protocols to obtain applets are possible. We have also modelled an 
example in which an applet is sent encrypted, using a simple public- key protocol. 
We could also model an SSL communication on which an applet is sent. 

Applets that are communicated by secure means (encryption or signatures) 
as shown in the previous example are often trusted to a certain degree, depending 
on the level of trust associated with the public key of the server pks- However, 
applets received through insecure communications (non-trusted applets) are run 
in special sandboxes that limit their capabilities. For example, such applets are 
prevented from accessing the local file system or contacting network addresses 
(except the address from which the applet originated). 

Consider again the client C in the signed applet example above. This time it 
is enhanced with a sandbox s6[. . .], within which applets not originating from the 
server S are run. On the other hand, trusted applets are given more privileges 
by running them outside s6[. . .]: 
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C = (c*(sndapp).c*(Aa;).z/ checksign(a;,pfc 5 ) = True then exec(extract(x)) 

else (i/s6)(s&[exec(extract(a;))] | P)) \ Q 

Process P models the Java standard library. Its communications with the applet 
correspond to method calls and returns. The process P can communicate with 
the outside of the sandbox, and for instance send messages coming from the 
applet on the network (perhaps after examining them for security). Process 
Q, on the other hand, is protected from the applet by the fact that it falls 
outside the scope of the sandbox. Prop. 4 can be applied to the first occurrence 
of exec(extract(x)) in C, and the sandbox prevents direct communications 
between the applet and the outside world: all these communications must go 
through P, since one needs to know sb to communicate with the interior of the 
sandbox. (Obviously, we assume that P is disciplined so as not to reveal the 
name sb.) 

4.3 Certified Email Protocol 

We consider here a cryptographic protocol whose implementation relies on a 
Java applet: the email protocol of [5]. In this protocol, a sender S sends an 
email message to a receiver R, such that R has the message if and only if S has 
a receipt. The protocol also involves a trusted third party TTP. In a slightly 
simplified version, the protocol runs as follows: 

Message 1. S — s- R: “This is a certified email. 

Please read it in a Java enabled browser” 

<APPLET C0DE="https : //ttp . com/applet .jar" 
PARAM=(cleartext, em, S2TTP)> 

Message 2. R — !■ TTP: get https://ttp.com/applet.jar 
Message 3. TTP R: applet, jar 

Message 4. R — TTP: S2TTP, “owner of RPwd wants key for h” 

Message 5. TTP — >■ R: “try k for h” 

Message 6. TTP — >■ S: sign(“I have released the key for S2TTP to R”,skTTp) 

The sender S wants to send to R the message m. It creates a fresh key k, encrypts 
m with k, obtaining em = encrypt(m, k). It also computes h = H((cleartext, em)) 
and S2TTP = pencrypt((S,“give k to R for h”),pkjjp), where pencrypt is 
public-key encryption and H is a one-way hash function. Then it sends to R an 
html email (message 1) containing a header explaining how to read the email, 
as well as a reference to an applet https ; //ttp . com/applet .jar taking as pa- 
rameters (cleartext, em, S2TTP), where cleartext is an explanation of the contents 
of the certified e-mail. The reader’s browser downloads the applet on a secure 
SSL connection (messages 2 and 3) and executes it with the indicated parame- 
ters. The applet displays cleartext to the receiver, and asks him for his password 
RPwd. If R decides to read the email, it enters its password RPwd and the applet 
sends message 4 to TTP on a secure channel built by a combination of public-key 
and shared-key encryption. The value of h in this message is recomputed by the 
applet from cleartext and em. TTP replies to R with message 5 on the same secure 
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channel as message 4. Upon receipt of message 5, the applet decrypts em with 
k, and displays m. TTP also sends the receipt (message 6) to S. The delivery of 
messages 5 and 6 is guaranteed. We have formally modeled this protocol in our 
calculus, by combining the model of [2] (which omitted the applet) and a model 
of the applet along the same lines as the previous example. The resulting model 
is 229 lines long (including comments) so we do not detail it here due to the lack 
of space. 

When one assumes that the applet behaves as expected, the protocol satisfies 
its security properties [2,7]. However, when one takes into account the download- 
ing of the applet, one finds the following attack, which was mentioned to us by 
the authors of the protocol: in message 1 , S sends the address of an applet on its 
own web site. This applet behaves as expected, except that it never displays the 
clear message to R. Because of the security constraints on applets, the applet 
can contact only the web site it comes from. However, this web site will simply 
forward the messages to TTP. Then the protocol runs as expected, the sender 
gets its receipt in message 6, but the receiver never has the clear message because 
the applet does not display it. This contradicts the main security property of 
the protocol. 

One can see several corrections to this problem, so that the applet can come 
only from TTP. The easiest one would be for the receiver to manually check the 
address of the applet. Since the address contains https, the applet is downloaded 
using SSL, so we have the guarantee that the obtained applet is the expected 
one, as mentioned in the previous example. A result similar to Prop. 4 formalizes 
this guarantee. Then we can use the security properties of the protocol without 
applet to show the security of the protocol with applet. We could also require 
that the applet be signed by TTP. However, browsers accept to run unsigned 
applets, so the receiver should have a specific piece of software that checks that. 
In that case, it is probably simpler not to use an applet at all: R should simply 
have downloaded the software that executes the protocol. 

In this example, our calculus makes it possible to formally model the protocol 
including the applet and a combination of mobility and cryptography (the applet 
is sent on an SSL connection and also performs cryptographic operations itself). 
An attack appears in this model while it does not in usual models of protocols 
that cannot represent applets. Indeed, this attack relies crucially on furnishing a 
malicious applet to the receiver. We can also prove the correctness of corrected 
versions of the protocol. 

5 Comparison with Other Process Calculi 

Models of Cryptography. Several variants of the spi calculus [6] have been pro- 
posed for modelling cryptography. Here, we focus on the most powerful of these 
variants, the applied pi calculus [3]. We extend this calculus by adding locations 
and packed processes. There are other relatively important differences between 
our calculus and the applied pi calculus. We do not introduce floating substitu- 
tions and restriction on variables in the calculus, since these are useful only for 
the labelled bisimilarity. We prefer keeping the calculus as standard and simple 
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as possible, even if the proof of Th. 1 requires a few more technical lemmas. We 
also remove the “open” rules in the labelled bisimilarity, and do not resort to 
a sort system to prevent channel names from being manipulated by functions. 
To compensate this point, our static equivalence takes into account that the 
adversary can test whether a term is a name by using it as a channel. 

Models of Location- Based Mobility. One of the most well-known models of mobil- 
ity is the ambient calculus [12]. In this calculus, locations can move according to 
their own will and open other ambients. It is very difficult to restrict these move- 
ments. This makes the modelling of security properties difficult. Several variants 
have been designed to solve this problem: the safe ambients [16], in which each 
movement or opening of an ambient must be authorized by a co-capability, the 
secure safe ambients [9], which are a variant of safe ambients with a security 
type system, and the boxed ambients [10] in which the opening of ambients is 
removed and replaced with non-local communications between ambients. We use 
the communication mechanism of boxed ambients, but not the ambient mobility 
model: we believe that objective mobility (with processes sent as messages) is 
closer to what happens most often in networks. In contrast, ambient calculi are 
probably better at modelling situations in which human beings move security 
devices, for example. 

Boxed ambients and cryptography have already been combined in [11]. The 
approach differs from our work in many aspects: [11] considers only shared- key 
cryptography; it represents messages as moving ambients, which, we think, is 
less natural than considering them as sent messages; it gives a type system for 
proving secrecy, whereas we focus on the proof of observational equivalence. 

The seal calculus [20] provides strong objective mobility: processes are sent on 
the network by specific input and output constructs. We focus on weak mobility, 
and simplify the calculus by using the ordinary input and output instructions 
to manipulate packed processes. In the original version [20], the seal calculus 
used an open construct, combined to a local input or output to create non- 
local communications. In a recent draft, G. Castagna, F. Nardelli, and J. Vitek 
have adopted a simpler communication mechanism similar to the one of boxed 
ambients, which we adopt. 

The DTT-calculus of [17] includes a notion of migration of the current location 
(as in ambients) but also features such as halted and running locations, that we 
omit from our calculus for simplicity. In the distributed 7r-calculus of [19], each 
subterm is explicitly located, instead of having locations contain a process, and 
communications can take place between different locations without constraints. 

The Join Calculus. The previous models are all based on the pi calculus. An- 
other process calculus that can be used for mobility is the join calculus [13]. It 
has the same expressive power as the pi calculus, but uses asymmetric channels 
with one receiver instead of symmetric channels. (One type of channel can en- 
code the other.) It has also been extended to cryptography, similarly to the spi 
calculus [4]. The distributed join calculus [15] considers a tree of nested locations, 
with subjective mobility (but no cryptography). 
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6 Conclusion 

In this paper, we have presented crypto-loc, a process calculus combining the 
concepts of cryptography, locations, and weak code mobility. We have also de- 
fined a theory of process equivalence, and showed that observational equivalence 
equals labelled bisimilarity. The theory was used in defining and proving security 
properties. 

An interesting area for further research would involve extending automatic 
static analyses techniques that have already been developed for cryptographic 
protocols to this calculus. Indeed, the presence of locations, and of communi- 
cation and cryptographic operations on code raises new difficulties for static 
analysis. 
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Abstract. Resource control has attracted increasing interest in foundational re- 
search on distributed systems. This paper focuses on space control and develops 
an analysis of space usage in the context of an ambient-like calculus with bounded 
capacities and weighed processes, where migration and activation require space. 
A type system complements the dynamics of the calculus by providing static 
guarantees that the intended capacity bounds are preserved throughout the com- 
putation. 



Introduction 

Emerging computing paradigms, such as Global Computing and Ambient Intelligence, 
envision scenarios where mobile devices travel across domains and networks bound- 
aries. Current examples include smart cards, embedded devices (e.g. in cars), mobile 
phones, PDAs, and the list keeps growing. The notion of third-party resource usage will 
raise to a central role, as roaming entities will need to borrow resources from host net- 
works and, in turn, provide guarantees of bounded resource usage. This is the context of 
the present paper, which focuses on space consumption and capacity bound awareness. 

Resource control, in diverse incarnations, has recently been the focus of founda- 
tional research. Topics considered include the ability to read from and to write to a 
channel [15], the control of the location of channel names [18], the guarantee that dis- 
tributed agents will access resources only when allowed to do so [8, 14, 1, 6, 7]. Specific 
work on the certihcation of bounds on resource consumption include [9], which intro- 
duces a notion of resource type representing an abstract unit of space, and uses a linear 
type system to guarantee linear space consumption; [4] where quantitative bounds on 
time usage are enforced using a typed assembly language; and [11], which puts forward 
a general formulation of resource usage analysis. 
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We elect to formulate our analysis of space control in an ambient-like calculus, 
BoCa, because the notion of ambient mobility is a natural vehicle to address the in- 
tended application domain. Relevant references to related work in this context include 
[5], which presents a calculus in which resources may be moved across locations pro- 
vided suitable space is available at the target location; [17], which uses typing systems 
to control resource usage and consumption; and [3], which uses static techniques to 
analyse the behaviour of hnite control processes, i.e., those with bounded capabilities 
for ambient allocation and output creation. 

BoCa relies on a physical, yet abstract, notion of “resource unit” defined in terms 
of a new process constructor, noted _ (read “slot”), which evolves out of the homonym 
notion of [5]. A slot may be interpreted as a unit of computation space to be allocated 
to running processes and migrating ambients. To exemplify, the configuration 



represents a system which is running process P and which has k resource units available 
for P to spawn new subprocesses and to accept migrating agents willing to enter. In both 
cases, the activation of the new components is predicated to the presence of suitable 
resources: only processes and agents requiring cumulatively no more than k units may 
be activated on the system. As a consequence, process activation and agent migration 
involve a protocol to “negotiate” the use of resources with the enclosing, resp. receiving, 
context (possibly competing with other processes). 

For migrating agents this is accounted for by associating each agent with a tag 
representing the space required for activation at the target context, as in 4 p] . A notion 
of well-formedness will ensure that k provides a safe estimate of the space needed 
by fl[ P ] ; namely, the number of resource units allocated to P. Correspondingly, the 
negotiation protocol for mobility is represented formally by the following reductions 
(where is short for ™ | . . . | k times): 



In both cases, the migrating agent releases the space required for its computation at the 
source site and gets corresponding space at the target context. Notice that the reductions 
construe _ both as a representation of the physical space available at the locations of 
the system, and as a particular new kind of co-capability. 

Making the weight of an ambient depend explicitly on its contents allows a clean 
and simple treatment of the open capability: opening does not require resources, as 
those needed to allocate the contents are exactly those taken by the ambient as such. 



Notice that in order for these reductions to provide the intended semantics of re- 
source negotiation, it is crucial that the redexes are well-formed. Accordingly, the dy- 
namics of ambient mobility is inherently dependent on the assumption that all migrating 




k times 



a'^[inb.P\Q]\b[J^\R] \ _^ | P | g] | P ] 

_‘'|/7[P|a‘'[outfe.e|P]] \ I ^[^’1-*'] 



opn a.P I a[ opn. g I P] \ P|g|P 



A Calculus of Bounded Capacities 



207 



agents are well-formed. As we shall discuss, this assumption is central to the definition 
of behavioural equivalence as well. 

Resource management and consumption does not concern exclusively mobility, as 
all processes need and use space. It is natural then to expect that “spawning” (activating) 
processes requires resources, and that unbounded replication of processes is controlled 
so as to guard against processes that may consume an infinite amount of resources. 
The action of spawning a new process is made explicit in BoCa by introducing a new 
process construct, ko , whose semantics is defined by fhe following reduction; 

koP|_‘' \ P 

Here k t> P is a “spawner” which launches P provided that the local context is ready to 
allocate enough fresh resources for the activation. The tag k represents the “activation 
cost” for process P, viz. its weight, while k > P, the “frozen code” of P, weighs 0: 
again here the hypothesis of well-formedness of terms is critical to make sense of the 
spawning protocol. The adoption of an explicit spawning operator allows us to delegate 
to the “spawner” the responsibility of resource control in the mechanism for process 
replication. In particular, we restrict the replication primitive “!" to 0-weight processes 
only. We can then rely on the usual congruence rule that identifies IP with !P | P, and 
use !(ki> P) to realise a resource-aware version of replication. This results in a system 
which separates process duplication and activation, and so allows a fine analysis of 
resource consumption in computation. 

BoCa is completed by two constructs that provide for dynamic allocation of re- 
sources. In our approach resources are not “created” from the void, but rather acquired 
dynamically - in fact, transferred - from the context, again as a result of a negotiation. 

fl‘'+'[put.P|_|e]|h^'[geta.P|5] \ a''[P | g] | h^'+‘[P | _ | 5] 

put^.P|_|ak[getT.e|P] \ P|flk+i[_|e|P] 

Resource transfer is realised as a two-way synchronisation in which a context offers 
some of its resource units to any enclosed or sibling ambient that makes a corresponding 
request. The effect of the transfer is reflected in the tags that describe the resources 
allocated to the receiving ambients. We formalise slot transfers only between siblings 
and from father to child. As we shall see, transfers across siblings make it possible to 
encode a notion of private resource, while transfer from child to parent can easily be 
encoded in terms of the existing constructs. 

The semantic theory of BoCa is supported by a labelled transition systems which 
gives rise to a bisimulation congruence adequate with respect to barbed congruence. 
Besides enabling powerful co-inductive characterizations of process equivalences, the 
labelled transition system yields an effective tool for contextual reasoning on process 
behavior. More specifically, it enables a formal representation of open systems, in which 
processes may acquire resources and space from their enclosing context. Due to the 
lack of space, here we only discuss the notion of barb, leaving the presentation of the 
transition system to the forthcoming full version of the paper. We will focus, instead, 
on BoCa’s capacity types, a system of types that guarantees capacity bounds on com- 
putational ambients. Precisely, given lower and upper bounds for ambients capacities. 
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the system enables us to certify statically the absence of under/over-flows, potentially 
arising from an uncontrolled use of dynamic space allocation capabilities. 

We remark that our approach is typical of a way to couple language design with 
type analysis very useful in frameworks like Global Computing, where it is ultimately 
unrealistic to assume acquaintance with all the entities which may in the future interact 
with us, as it is usually done for standard type systems. The openness of the network and 
its very dynamic nature deny us any substantial form of global knowledge. Therefore, 
syntactic constructs must be introduced to support the static analysis, as e.g., our “ne- 
gotiation” protocols. In our system, the possibility of dynamically checking particular 
space constraints is a consequence of the explicit presence of the primitive _. A further 
reason to avoid resource control mechanisms in ambient-like calculi mainly based on 
static typing systems is that they tend, as perfectly illustrated in [17], to require 3-way 
synchronisations which, as explained in [16], make the calculus cumbersome. 

Structure of the paper. In § 1 we give the formal description of BoCa and its operational 
semantics, and we illustrate it with a few examples. The type system for the calculus is 
illustrated in §2. In §3 we discuss the issue of resource interference, and we extend the 
calculus to deal with private resources in the form of named slots. 

1 The Calculus BoCa 

The calculus is a conservative extension of the Ambient Calculus. We presuppose two 
mutually disjoint sets: of names, and “V of variables. The set T' is ranged over 

by letters at the end of the alphabet, typically v,y,z, while a,b,c,d,n,m range over 9f. 
Finally, h, k and other letters in the same font denote integers. The syntax of the calculus 
is defined below, with n and W types as introduced in §2. 

Definition 1 (Preterms and Terms). The set of process preterms is defined by the 
following productions (where we assume k > 0): 

Processes P | 0 | M.P | P| P | M‘'[ P] | !P | (Va : Jt)P | k>P| {x:W)P\ {M)P 
Capabilities C ::= in M | out M \ opn M \ get M \ get^ | opn | put | put*^ 

Messages M ::= nil \ a ^ x ^ ^ \ C \ M .M 

A (well-formed) term P is a preterm such that w{P) ^ _L, where w : Processes ^ co is 
the partial weight function defined as follows: 

H'(O) = 0 vf (— ) = 1 w{P I Q) = w{P) + w(Q) 

w{M.P) = w{{x : x)P) = w{{M)P) = w{{va : n)P) = w{P) 

w(fl''[P]) = if w{P) is k then k else _L 

w(kc> P) = if w(P) is k then 0 else _L 

w(!P) = if w{P) is 0 then 0 else _L 

We use the standard notational conventions for ambient calculi. We omit types when 
not relevant; we write a[ P ] instead of P ] when the value of k does not matter; we 
use as a shorthand for _ | . . . | _ and similarly C'' as a shorthand for C C 
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1.1 Reduction 

The dynamics of the calculus is defined as usual in terms of structural congruence and 
reduction (cf. Figure 1). Unlike other calculi, however, in BoCa both relations are only 
defined for proper terms, a fact we will leave implicit in the rest of the presentation. 



Structural Congruence: (|,0) is a commutative monoid. 



(Va)(P|2) = 


{Va)P\Q (fl^fn(e)) 




0]^ 0 


(Vfl)O = 


0 


{va){vb)P = {vb){va)P 


!P = 


P| !P 


a[{vb)P] = (v(7)a[P] {a^b 


Reduction: E ::= {•} | 


£ 1 P 1 iym)E 1 ra''[ £ ] is an evaluation context 


(ENTER) 


a\inb.P\Q]\b\J^\R\ 


\ 


-^\b[a^[P\Q]\R] 


(EXIT) 


J^\b[P\a'^[outb.Q\R]] 


\ 


a'^[Q\R]\b[P\J^] 


(OPEN) 


opn fl.P 1 fl[ opn. Q\R\ 


\ 


P\Q\R 


(GETS) fl'^+l[put.P|_| Q] 1 Z7^[geta.P|S] 


\ 


a^[P\Q]\b^+^[R\. 


(getD) 


put^.P|-| fl''[get^e|£] 


\ 


P|fl^+i[_|e|£] 


(SPAWN) 


k>P|_'^ 


\ 


P 


(EXCHANGE) 


{x-.%)P\{M)Q 


\ 


P{x-.= M}\Q 


(Struct) 


P = P' P'\Q' Q' = Q 




P\Q 


(Context) 


P\Q 




E{P}\E{Q} 



Fig. 1. Structural Congruence and Reduction 



The reduction relation \ is defined according to the intuitions discussed in the 
introduction; we denote with the reflexive and transitive closure of \. Structural 
congruence is essentially standard. The assumption of well-formedness is central to 
both relations. In particular, the congruence !P = F | \P only holds with P a proper term 
of weight 0. Thus, to duplicate arbitrary processes we need to first “freeze” them under 
ki>, i.e. we decompose arbitrary duplication into “template replication” and “process 
activation.” We define !ki>, which gives us | \ | P. 

A few remarks are in order on the form of the transfer capabilities. The put capa- 
bility (among siblings) does not name the target ambient, as is the case for the dual 
capability get. We select this particular combination because it is the most liberal one 
for which our results hold. Of course, more stringent notions are possible, as e.g. when 
both partners in a synchronisation use each other’s names. Adopting any of these would 
not change the nature of the calculus and preserve, mutatis mutandis, the validity of our 
results. In particular, the current choice makes it easy and natural to express interesting 
programming examples (cf. the memory management in §1.3), and protocols: e.g., it 
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enables us to provide simple encoding of named (and private) resources allocated for 
spawning (cf. §3). Secondly, a new protocol is easily derived for transferring resources 
“upwards” from children to parents using the following pair of dual put and get. 

get^a . P = (Vm) (opn m.P\ m[ get a . opn ] ) , and put^ = put 

Transfers affect the amount of resources allocated at different nesting levels in a sys- 
tem. We delegate to the type system of §2 to control that no nesting level suffers from 
resource over- or under-flows. The reduction semantics itself guarantees that the global 
amount of resources is preserved, as it can be proved by an inspection of the reduction 
rules. 

Proposition 1 (Resource Preservation). Ifw{P) ^ _L, and P Q, then w{Q) = w{P). 

Two remarks about the above proposition are in order. First, resource preservation is a 
distinctive property of closed systems; in open systems, instead, a process may acquire 
new resources from the environment, or transfer resources to the environment, by exer- 
cising the put and get capabilities. Secondly, the fact that the global weight of a process 
is invariant through reduction does not imply that the amount of resources available for 
computation also is invariant. Indeed, our notion of slot is an economical way to convey 
the three different concepts of a resource being/ree, allocated, or wasted , according to 
the context in which _ occurs during the computation. Unguarded slots, as in 
represent resources available for spawning or mobility at a given nesting level; guarded 
slots, like M._, represent allocated resources, which may potentially be released and 
become free; and unreachable slots, like (va)in a.^ or (Va)a''[— ], represent wasted 
resources that will never be released. 

Computation changes the state of resources in the expected ways: allocated re- 
sources may be freed, as in opn | a[ P ] \ _ | P; free resources may be allocated, as 

in _ I 1 1 > M._ \ M._, or wasted as in put^ | _ | {Va)a[ get^ ] \ {va)a[ _ ] . No further 
transition for wasted resource is possible: in particular, it may never become free, and 
re-allocated. Accordingly, while the global amount of resources is invariant through 
reduction, as stated in Proposition 1, the computation of a process does in general con- 
sume resources and leaves a non-increasing amount of free and allocated resources. We 
leave to our future work the development of a precise analysis of resource usage based 
on the characterization we just outlined, and focus on the behavioural semantics of the 
calculus instead. 



1.2 Behavioural Semantics 



The semantic theory of BoCa is based on barbed congruence [13], a standard equality 
relation based on reduction and a notion of observability. As usual in ambient calculi, 
our observation predicate, P la, indicates the possibility for process P to interact with 
the environment via an ambient named a. In Mobile Ambients (MA) this is defined as 
follows: 



( 1 ) 



Pi. 



P=(vm)(fl[P']|e) 



a ^ m 
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Since no authorisation is required to cross a boundary, the presence of an ambient a 
at top level denotes a potential interaction between the process and the environment 
via a. In the presence of co-capabilities [12], however, the process {vm){a[ P\ \ Q) 
only represents a potential interaction if P can exercise an appropriate co-capability. 
The same observation applies to BoCa, as many aspects of its dynamics rely on co- 
capabilities: notably, mobility, opening, and transfer across ambients. Correspondingly, 
we have the following reasonable choices of observation (with a ^ {m}): 



(2) 


PIT 


= P = (vm) (fl[ opn . P' 


\Q]\R) 


(3) 


Pif 


= P= (Vm)(fl[_| e] 1 


R) 


(4) 


PIT 


= P = (vm) (fl[ put. P' 


l-ie]|^) 



As it turns out, definitions (l)-(4) yield the same barbed congruence relation. Indeed, 
the presence of 0 -weighted ambients makes it possible to rely on the same notion of 
observation as in MA, that is (1), without consequences on barbed congruences. We 
discuss this in further detail below. 

Our notion of barbed congruence is standard, except that we require closure by well- 
formed contexts. Say that a relation ^ is reduction closed if PH{,Q and P \ P' imply 
the existence of some Q' such that Q Q' and P'^J^-, it is barb preserving if P'RjQ 
and Pla imply gJIa, i-e. 2\*ia- 

Definition 2 (Barbed Congruence). Barbed bisimulation, noted ~, is the largest sym- 
metric relation on closed processes that is reduction closed and barb preserving. Two 
processes P and Q are barbed congruent, written P=Q,if for all contexts C[-], preterm 
C[P] is a term iff so is C[Q\, and then C[P] — C[Q\- 

Let then =, be the barbed congruence relation resulting from Definition 2 and from 
choosing the notion of observation as in (/) above (with i € [1..4]). 

Proposition 2 (Independence from Barbs). =i = =j for alii, j € [1.4]. 

Since the relations differ only on the choice of barb. Proposition 2 is proved by just 
showing that all barbs imply each other. This can be accomplished, as usual, by exhibit- 
ing a corresponding context. For instance, to see that =3 implies =2 use the context 
C[-\ = [•] I opn a.h*[ — ], and note that for all P such that b is fresh in P one has P 
if and only if C[P] 

The import of the processes’ weight in the relation of behavioural equivalence is 
captured directly by the well-formedness requirement in Definition 2. In particular, 
processes of different weight are distinguished, irrespective of the their “purely” be- 
havioral properties. To see that, note that any two processes P and Q of weight, say, k 
and h with h fk, are immediately distinguished by the context C[-] = a^[ [•] ], as C\P] 
is well-formed while C[Q] is not. 

1.3 Examples 

We complete the presentation of the calculus with some encodings of systems and ex- 
amples in which space usage and control is modelled. 
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Recovering Mobile Ambients. The Ambient Calculus [2] is straightforwardly embed- 
ded in (an untyped version of) BoCa: it suffices to insert a process !opn in all ambients. 
The relevant clauses of the embedding are as follows; 

|a[P]l =a°[!bpn||Pl ], |(Va)Pl = (Va)|F’l 

and the remaining ones are derived similarly; clearly all resulting processes weigh 0. 

Encoding Father-Son Swap. In BoCa, like in any situation where ambient weighs, 
this swap is possible only in case the father and child nodes have the same weight. We 
present it for example in the case of weight 1. Notice the use of the primitives for child 
to father slot exchange that we have defined in § 1 . 

/?*[ get^a.put.in a.get^ | a'[ put^ .out fe.get ^?.put^ I — ] ] \* ^*[ — ] ] 

Encoding Ambient Renaming. We can represent in BoCa a form of ambient self- 
renaming capability. First, define spwQ^?''[ P ] = exp^[ out a . opn. k c> P ] ] and then 
use it to define 

a \}(t^b.P = I opn fl] | in ^j.bpn.P 

Since opn exp | | a^[ spwQ^?''[ P] | Q] \* P] | 2] where k, h are the weights 

of P and Q, respectively, we get 

a^[a\}(^b.P\ P] I — I opn exp \„, fe''[P | P] | 

So, an ambient needs to borrow space from its parent in order to rename itself. We 
conjecture that renaming cannot be obtained otherwise. 

A Memory Module. A user can take slots from a memory module MEM_MOD using 
MALLOC and release them back to MEM_MOD after their use. 

256MB 

, ^ , 

MEM_MOD = mem[ _256mb | ^,pjj m\ ... I opn m ] 

MALLOC = m[ out w.in mem . opn . put . get u.opn m ] 

USER = m[ ....MALLOCI ....get mem . . . put | . . . ] 

2 Bounding Resources, by Typing 

In this section we discuss a type system that provides static guarantees for a simple 
behavioural property, namely the absence of space under- and over-flows arising as a 
result of transfers during the computation. To deal with this satisfactorily, we need to 
take into account that transfer (co)capabilities can be acquired by way of exchanges. 
The type of a capability will hence have to express how it affects the space of the 
ambient in which it can be performed. 
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2.1 The Types 

We use Z to denote the set of integers, and note 21+ and Z^ the sets of non-negative 
and non-positive integers respectively. We define the following domains: 

Intervals t G 3 = {[n, N] | n, N G Z+, n < N} 

Effects 8 G £ = {(d,i) I d G i G Z+} 

Thread Effects (|) G O = £ ^ E 

Intervals and effects are ordered in the usual way, namely: [n, N] < [n', N'] when n' < n 
and N < and (d, i) < (d^, i^) when d^ < d and i < . It is also convenient to define the 

component-wise sum operator for effects: (d, i) -f (d', i') = (d -f d', i -f i'), and lift it to O 
pointwise: (|)i -|-(|)2 = Xe.(|)i( 8 ) -f (|) 2 ( 8 ). 

The syntax of types is defined by the following productions: 

Message Types W ::= An^b{l,^,x) \ Cap{(^,i) 

Exchange Types %::=5'M|VT 

Process Types 7t ::= Proc(e,x) 

Type Proc{£,x) is the type of processes with 8 effects and x exchanges. Specifically, for 
a process P of type Proc((d, i),%), the effect (d, i) bounds the number of slots delivered 
(d) and acquired (i) by P as the cumulative result of exercising P’s transfer capabilities. 

Type Amb{\^z,x) is the type of ambients with weight ranging in t, and enclosing 
processes with 8 effects and % exchanges. As in companion type systems, values that can 
be exchanged include ambients and (paths of) capabilities, while the type Shh indicates 
no exchange. As for capability types, Cap{^^y) is the types of capabilities which, when 
exercised, unleash processes with % exchanges, and compose the effect of the unleashed 
process with the thread effect (|). The functional domain of thread effects helps compute 
the composition of effects. In brief, thread effects accumulate the results from gets and 
puts, and compose these with the effects unleashed by occurrences of opn. 

We introduce the following combinators (functions in O) to define the thread effects 
of the put, get and open capabilities. 

Put = X(d, i).(d — 1, max(0, i — 1)) 

Get = X(d, i).(min(0,d + 1), i-f 1) 

0pen(8) = ?^(d,i).(8-|-(d,i)) 

The intuition is as follows. A put that prefixes a process P with cumulative effect (d, i), 
contributes to a “shift” in that effect of one unit. The effect of a get capability is dual. To 
illustrate, take P = put. put. get a. The thread effect associated with P is computed as 
follows, where we use function composition in standard order (i.e. / o g(x) = /(g(x))): 

8 = (Put o Put o Get)((0,0)) = (—2,0). 

The intuition about an open capability is similar, but subtler, as the effect of opening an 
ambient is, essentially, the effect of the process unleashed by the open: in opn n.P, the 



214 



Franco Barbanera et al. 



process unleashed by opn n runs in parallel with P. As a consequence, open has an ad- 
ditive import in the computation of the effect. To motivate, assume that n : Amb{\,z,y) . 
Opening n unleashes the enclosed process in parallel to the process P. To compute 
the resulting effect, we may rely on the effect 8 declared by n to bound the effect 
of the unleashed process: that effect is then is added to the effect of the continua- 
tion P. Specifically, if P has effect e', the composite effect of opn n.P is computed 
as 0pen(e)(8') = 8-l-e'. 

2.2 The Typing Rules 

The typing rules are collected in Figures 2 and 3, where we denote with id® the identity 
element in the domain O. 

The rules in Figure 2 derive judgements F h M : IT for well-typed messages. The 
rules draw on the intuitions we gave earlier. Notice, in particular, that the capabilities 
in, out and the cocapability opn have no effect, as reflected by the use id® in their type. 
The same is true also of the the co-capability put^. In fact, by means of the superscript 
k in F ] we can record the actual weight of the ambient (cf. reduction rule (getD)). 
This implies that the weight of an ambient in which put^ is executed does not change: 
the ambient loses a slot, but the weight of one of its sub-ambients increases. 



(axiom) 

T,M :W\- M :W 



r\- M :Amb{-,-,-) 
T h get M : Cap{Get,%) 



rh get^^ : Cap{Get,x) 



(get M) 



(get!') 



(/7i7) 

r h nil : Cap{\d^,%) 



(put) 

r h put : Cap(Put,%) 



^ (put-^) 

rh puF : Cap(id®,x) 



r h M : Amb{—, 

(in M) 

Fh inM : Cap(id<j,,x) 

F F M : Amb{—,e,x) 

(opn M) 

FF opnM : Cflp(Open(e),%) 



F F M : Amb{—, 

(out M) 

FF outM : Cap(ido,x) 



(opn) 

FF opn : Cap(ido,x) 



rhM:Cap(<l>,x) Fh M' : Cap(<p',x) 

(path) 

F F mm' ; Cap((|) o (|)^x) 



Fig. 2. Good Messages 



The rules in Figure 3 derive judgements F F P : Proc{e.,x) for well-typed processes. 
An inspection of the typing rules shows that any well-typed process is also well-formed 
(in the sense of Definition 1. We let Oe denote the null effect (0,0): thus, rules (0) and 
(_) simply state that the inhert process and the slot form have no effects. Rule (prefix) 
computes the effects of prefixes, by applying the thread effect of the capability to the 
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ri-_:Froc(02-,%) T ^ Q ■. Proc{Q^,%) 

r h M : Cap((|),x) T\- P \ Proc{z,y) 

- {prefix) 



( 0 ) 



r h M.P : Pwc{()){e),x) 



T\- P : Proc{e,x) T ^ Q '■ Proc{e' ,x) 



r h P I Q : Proc{z + z' ,x) 
r\-M:W r\- P:Proc{e,W) 
rh {M)P : Proc{e,W) 



■ {pnr) 



T,x-.W^P-.Proc{t,W) 

{x-.W)P:Proc{t,W) 



[input) 



T,a : Amb{i,e,x) ^ P ’■ Proc{e',x') 

[output) [new) 

T h [Va : Amb{l,t,x))P '■ Proc{e' ,x') 



r\- M : Amb{ [n , N] , e, x') [max(k + d , 0) , k + i] < [n , N] 

r\- P : Pwc{[d,\),x') w[P) = k (d — i, min(N — n, i — d)) < e 

rPM'^[P]-.Proc{0^,X) 



[amb) 



T\- P ■.Proc{0,£,X) w[P) = k T\- P : Proc{Q^,x) w[P)=0 

[spawn) [bang) 



r h k> P : Proc{0'£,x) 



n- \P ■. Proc{Q.^,x) 



Fig. 3. Good Processes 



effect of the process. Rule (par) adds up the effects of two parallel threads, while the 
constructs for input, output and restriction do not have any effect. 

Rule [amb) governs the formation of ambient processes. The declared weight k of 
the ambient must reflect the weight of the enclosed process. Two further conditions 
ensure (;) that k modified by the effect (d,i) of the enclosed process lies within the 
interval [n , N] declared by the ambient type, and [ii) that effect 8 declared by the ambient 
type is a sound approximation for the effects released by opening the ambient itself. 
Condition (/) is simply [max(k + d,0), k+ i] <[n, N], where the use of max(k + d,0) is 
justified by observing that the weight of an ambient may never grow negative as a result 
the enclosed process exercising put capabilities. To motivate condition [ii), first observe 
that opening an ambient which encloses a process with effect (d,i) may only release 
effects 8< (d — i,i — d). The lower bound arises in a situation in which the ambient is 
opened right after the enclosed process has completed its i get ’s and is thus left with 
|d — i| put’s unleashed in the opening context. Dually, the upper bound arises when the 
ambient is opened right after the enclosed process has completed its d put’s, and is left 
with i — d get ’s. On the other hand, we also know that the maximum increasing effect 
released by opening ambients with weight ranging in [n, N] is N — n. Collectively, these 
two observations justify the condition (d — i, min(N — n, i — d)) < 8 in rule [amb). 

In rule [spawn), the effect of k c> R is the same as that of the reduct P. Finally, to 
prevent the effects of duplicated processes to add up beyond control, with unpredictable 
consequences, rule [bang) enforces duplicated process to have null effects. 

A first property of the given type system is that all typable preterms are terms. 
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The following result complements Proposition 1 and shows that capacity bounds on 
ambients are preserved during computations, while the processes’ ability to shrink or 
expand reduces. 

Theorem 1 (Subject Reduction). Assume T h P : Pwc{£.,x) P \* Q- Then T h 
Q : Proc{e',x) for some e' < £. 

It follows as a direct corollary that no ambient may be subject to under/over-flows dur- 
ing the computation of a process. 

Theorem 2 (Absence of Under/Over-Flow). Assume T h P : Proc{E, %) and let P\,^^ 
Q. If a : Amb{\n, N], — , — ) G T, then, for any subterm ofQ of the form P], not in the 

scope of a binder for a, we have n < k < N. 



2.3 Typed Examples 

A Typed Memory Module. As a first illustration of the typing system at work we give 
a typed version of the memory module of Section 1 .3. All other examples in that section 
are typeable too, and this can be easily verified. We sfart with the malloc ambient 

M ALLOC = m[ out u . in mem . opn . put . get u . opn m ] 

Since there are no exchanges, we give the typing annotation and derivation disregarding 
the exchange component from the types. Let Pmaiioc denote thread enclosed within the 
ambient m. If we let m : Am(>([0,0], (— 1,0)) G F, an inspection of the typing rules for 
capabilities and paths shows that the following typing is derivable for any ambient type 
assigned to mem: 

r h out M. in mem . opn . put . get u . opn m : Cap((Put o Get o fe.((—l,0) + e))(0'£)} 

Composing the thread effects, one has: Put o Get (—1,0) = (—1,0). From this one de- 
rives r h Pmaitoc '■ Proc{{—l,0)), which gives F h MALLOC : Procfl'z). As to the mem- 
ory module itself, it is a routine check to verify that the process 

MEM_MOD = mem[^^^^^ \ opn m \ . . . | opn m] 

typechecks with m : Am/7([0,0], (— 1,0)), mem : Amb{[Q ,256MB], {—256MB ,256MB)) . 

A Cab Trip. As a further example, we give a new version of the the cab trip protocol 
from [17], formulated in our calculus. A customer sends a request for a cab, which 
then arrives and takes the customer to his destination. The use of slots here enables 
us to model very naturally the constraint that only one passenger (or actually any fixed 
number of them) may occupy a cab. The typing environment contains call : W \ , cab : Wi , 
trip : Wo, loading : Wq, unloading : Wq, bye : Wq where W\ = Amb{[\, 1 ], 05 :), Wq = 
Amb{[0,0],0^). 



A Calculus of Bounded Capacities 



217 



CALL{from, client) = 

ca//*[ out client .oat from. in cflZ7.0pn.in from .{loading^[ out cab. in c/ienf.opn] | _) ] 
TRlP{from, to, client) = trip^[ out c/ienf .opn.out from, in to.unloading^[ in client .opn] ] 

CLIENT [from, to) = (Vc : Wi)c^[CALL{from,c) | opn loading. in cab.TRIP{from,to,c) 

I opn unloading, ont cab.bye^[ out c.in cflZ7.0pn.0ut to ]] 

CAB = cflZ7* [ — I ! (opn call . opn trip, opn bye) ] 

SITE{i) = sitei[ CLIENT [sitei, site j) \ CLIENT{sitei,sitei) | | | | ] 

CITY = city[ CAB \ CAB | • • ■ | • • • | SITE{i) | | SITE{n) | _ | _ | • • • ] 

The fact that only one slot is available in cab together with the weight 1 of both call 
and client prevents the cab to receive more than one call and/or more than one client. 
Moreover this encoding limits also the space in each site and in the whole city. 

Comparing with [17], we notice that we can deal with the cab’s space satisfactorily 
with no need for 3-way synchronisations. Unfortunately, as already observed in [17], 
this encoding may lead to unwanted behaviours, since there is no way of preventing 
a client to enter a cab different from that called and/or the ambient bye to enter a cab 
different from that the client has left. We will give a safe encoding of this example in 
Subsection 3.2 using named slots. 

3 Controlling Races for Resources 

The calculus of the previous sections provides a simple, yet effective, framework for 
reasoning on resource usage and consumption. On the other hand, it is less effective 
in expressing policies to govern the allocation and distribution of space to distinct, 
possibly, competitive components. Indeed, with the current semantics it is not entirely 
obvious that a given resource unit can be selectively allocated to a specific agent, and 
protected against unintended use. To illustrate, consider the following term (and assume 
it well-formed): 

in b.P] I &[ l>g I _ I r/[ c*[ out d.R ] ] ] 

Three agents are competing for the resource unit in ambient b: ambients a and c, which 
would use it for their move, and the local spawner inside ambient b. While the race 
between a and c may be acceptable - the resource unit may be allocated by b to any 
migrating agent - it would also be desirable for b to reserve resources for internal use, 
i.e. for spawning new processes. In fact, reserving private space for spawning is possible 
with the current primitives, by encoding a notion of “named resource”. This can be 
accomplished by defining: 

—a — I — and k>(fl,T’) = (Vn)(n[ (get a)''.k> opn.T’] | opn n) 

Then, assuming w{P) = k, one has (va)(«^ | ko (fljT’)) = P, as desired. It is also possible 
to encode a form of “resource renaming”, by defining: 

{x/y}.P= (vn)(«[ gety.putopn] | a:[ get n.put] | opnn.P) 

Then, a y-resource can be turned in to an x-resource: {x/y}.P | . 3 . P \ 



218 



Franco Barbanera et al. 



Encoding a similar form of named, and reserved, resources for mobility is subtler. 
On the one hand, it is not difficult to encode a construct for reserving a j:-slot for am- 
bients named x. For example, ambients a and b may agree on the following protocol to 
reserve a private slot for the move of a into b. If we want to use the space in ambient b 
for moving a we can write the process: 

(v/7,g')(/?[in^7.getq'.l[>opn.fl^[_]] | | ^[_| put] | opnpj) 

On the other hand, defining a mechanism to release a named resource to the context 
from which it has been received is more complex, as it amounts to releasing a resource 
with the same name it was allocated to. This can be simulated loosely with the cur- 
rent primitives, by providing a mechanism whereby a migrating ambient releases an 
anonymous slot, which is then renamed by the context that is in control of it. The prob- 
lem is that such a mechanism of releasing and renaming lacks the atomicity required 
to guard against unexpected races for the released resource. Indeed, we conjecture that 
such atomic mechanisms for named resources can not be defined in the current calculus. 

3.1 The Calculus, Refined 

We counter the problem by refining the calculus with named resources as primitive 
notions, and by tailoring the constructs for mobility, transfer and spawning accordingly. 
Resource units come now always with a tag, as in _r|, where r] G is the unit 

name. To make the new calculus a conservative extension of the one presented in §1, 
we make provision for a special tag to be associated with anonymous units: any 
process can be spawned on an anonymous slot, as well any ambient can be moved 
on it. In addition, we extend the structure of the transfer capabilities, as well as the 
construct for spawning and ambient as shown in the productions below, which replace 
the corresponding ones in § 1 . 

Processes P ::= ™ri | kc>T] P | M[ F’jn | . . . as in Section 1 

Capabilities C ::= get^ M \ get^ri | ■ ■ ■ as in Section 1 

Messages M as in Section 1 

Again, a (well-formed) term is a preterm such that in any subterm of the form or 
k>ri P. P has weight k. The weight of a process can be computed by rules similar to 
those of Section 1 . The anonymous slots will be often denoted simply as _, and in 
general rj will be omitted when equal to *; subscripts on ambients are omitted when 
irrelevant. 

The dynamics of the refined calculus is again defined by means of structural con- 
gruence and reduction. Structural congruence is exactly as in Figure 1, the top-level 
reductions are defined in Figure 4. 

The reductions for the transfer capabilities are the natural extensions of the original 
reductions of §1. Here, in addition to naming the target ambient, the get capabilities 
also indicate the name of the unit they request. The choice of the primitives enables 
natural forms of scope extrusion for the names of resources, even though resource tags 
may not be variables. Consider the following system: 

S = n[ (va)(put.P I — a I /?[ out n.in wr.opn.get^ nj) ] I opn p.g] 
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The reductions for ambient opening and exchanges are as in Figure 1, and the rules (ENTER) 
and (EXIT) have r| e {a,*} as side condition. The omitted subscripts p on ambients are meant to 
remain unchanged by the reductions. 



(ENTER) 


a'^[inb.P\Q]p\b[J^\R] 


\ 




(EXIT) 


J^\b[P\a'^[outb.Q\R]p] 


\ 


fl^[(2|P],|fc[P|_k] 


(GETS) 


h^+^[put.P|_l 1 Q] 1 a‘'[get|^ b.R\S] 


\ 


b''[P\Q]\a'^+^[R\^ 


(GETU) 


putJ-.P|_l |fl''+^[getT^.(2|P] 


\ 


p|flk[_iie|p] 


(SPAWN) 


k >r| • P 1 — 


\ 


p 



Fig. 4. Top-level reductions with named units 



Here, the private resource enclosed within ambient n is communicated to ambient m, as 
5\* (Vfl)(«[P] \ m[Q\^,\). 

The dynamics of mobility solves the problem we discussed above. To complete 
a move an ambient a must be granted an anonymous resource or an tt-resource. The 
migrating ambient releases a resource under the name that it was assigned upon the 
move (as recorded in the tag associated with the ambient construct). Finally, the new 
semantics of spawning acts as expected, by associating the process to be spawned with 
a specific set of resources. 

These dehnitions suggest a natural form of resource renaming (or rebinding), noted 
{^/p}k with the following operational semantics. 

{%}y.P\4\P\J^ 

Notice that this is a dangerous capability, since it allows processes to give particular 
names to anonymous slots, and for instance put in place possible malicious behaviours 
to make all public resources their own: !{V*}- This suggests that in many situations 
one ought to restrict ki>r| to T| € lA^. The inverse behaviour, that is a “communist for y 
spaces,” is also well-formed and it is often useful (even though not commendable by 
everyone). Notice however that it can be harmful too: !{*/y}- We have not dehned the 
name rebinding capability as a primitive of our calculus since it can be encoded using 
the new form of spawning as follows, for a fresh. 

Wplk'-P = (Vfl)(k>p (_^ I a“[opn]) I opna.P) 

Observe that the simpler encoding kt>p | P) is allowed only for processes P of 
weight 0. 

It is easy to check that the type system of Section 2 can be used without modifi- 
cations also for the calculus with named slots. For this calculus the same properties 
proved in Section 2 hold. 



220 



Franco Barbanera et al. 



Theorem 3 (Subject Reduction and Under/Over-Flow Absence). For the processes 
and reduction relation of this section, we have: 

(i) rh P: Proc{e,x) and P\^, Q imply T\- Q \ Proc{e',x) ^ £■ 

(ii) IfP,a : Amb{[r\, N],%) h P : Proc {£,%), P\h, C[a*'[P]], and the showed occurrence 
of a is not in the scope of a binder for a, then n < k < N. 



3.2 More Examples 

The Cab Trip Revisited. Named slots allow us to avoid unwanted behaviours when 
encoding the cah trip example. The cab initially contains one slot named call, hut after 
reaching the client’s site it will contain one slot with the client private name, and lastly 
when the client goes out of the cah it leaves one slot named bye. The call exiting the 
client leaves one slot tagged /orhye, which is reserved for spawning bye. The (resource) 
renaming in the sites and in the city allow the public reuse of resources. 

Let W\ =Amb{[l, 1 ], 0 e), Wq = Amfo([0,0],05:). As in Subsection 2.3 the typing envi- 
ronment contains call : W\, cab : Wi, trip : Wb, unloading : Wb, but bye : Wi and we do 
not need the ambient loading. 

CALL{from, client) = call^[ out client .out from, in cflZi.opn.in from, ^client ]forbye 
TRIP) from, to, client) = trip^[ out c/ient . opn . out from.in to .unloading^\in client .opn \ ] 
CLIENT [from, to) = {vc :Wi)c^[CALL{f rom,c) \in cab. TRIP{from, to, c) 

I opn unloading, out cab. 1 >forbye out c. in cut;, opn.out to.^all ]*]bye 
CAB = cab^ [ ^call I ! (opn call . opn trip, opn bye) ]* 

SITE{i) = 

sitei[CLIENT{sitei,sitej) \ CLIENT[sitei,sitei) | | i{*!forbye}\ I '{*!bye}\ 1 

CITY = 

city[ CAS I CAS I I SITE{\) | ■ ■ ■ | SITE[n) | _ | _ | • • • | '-{Vforbyeh \ 

A Travel Agency. We conclude the presentation with an example that shows the ex- 
pressiveness of the naming mechanisms for resources in the refined calculus. We wish 
to model clients buying tickets from a travel agency, paying them one slot (the —forth 
inside the client), and then use them to travel by plane. At most two clients may enter 
the travel agency, and they are served one by one. The three components of the systems 
are defined below. 

> THE AGENCY : ag^[—]i \ —req \ 

desk'[«,-g^ I \ [opn re q. \ i> forth -tht^y out desk, in cl.CONT]req)\\ 
where CONT = (opn . out ag. in plane, rdy ^ [ out cl]\ —getoff I opn getoff) 

0 THE CLIENT : cl ' [ in ag . req ' [ out cl . in desk . opn . —forth ] th I opn tkt ] d 

0 THE AIRCRAFT : plane [ I opn rfify^’.opn rdy^ .ROUTE. {GETOFF \ GETOFF) ] 
where GETOFF = getoff* [ in cl . opn . out plane \ — ] and ROUTE is the unspecified 
path modelling the route of the aircraft. 
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We assume that there exists only one sort of ticket, but it is easy to extend the 
example with as many kinds of ticket as possible plane routes. What makes the example 
interesting is the possibility of letting two clients into the agency, but serving them non- 
deterministically in sequence. Notice that the use of the named slots is essential for 
a correct implementation of the protocol. When the request goes to the desk, a slot 
named tkt is left in the client. This slot allows the ticket to enter the client. In this way 
we guarantee that no ticket can enter a client before its request has reached the desk. 

We assume the aircraft to leave only when full. This constraint is implemented by 
means of the rdy ambient. The ambient getoff enables the passengers to get off once at 
destination; assigning weight 1 to the ge toff ambients prevents them to get both into the 
same client. 

4 Conclusion and Future Work 

We have presented an ambient-like calculus centred around an explicit primitive rep- 
resenting a resource unit: the space “slot” _. The calculus, dubbed BoCa, features ca- 
pabilities for resource control, namely pairs get/put to transfer spaces between sibling 
ambients and from parent to child, as well as the capabilities in a and out a for ambi- 
ent migration, which represent an abstract mechanism of resource negotiation between 
travelling agent and its source and destination environments. A fundamental ingredi- 
ent of the calculus is c>(_), a primitive which consumes space to activate processes. 
The combination of such elements makes of BoCa a suitable formalism, if initial, to 
study the role of resource consumption, and the corresponding safety guarantees, in the 
dynamics of mobile systems. We have experimented with the all important notion of 
private resource, which has guided our formulation of a refined version of the calculus 
featuring named resources. 

The presence of the space construct _ induces a notion of weight on processes, and 
by exercising their transfer capabilities, processes may exchange resources with their 
surrounding context, so making it possible to have under- and over-filled ambients. We 
have introduced a type system which prevents such unwanted effects and guarantees 
that the contents of each ambient remain within its declared capacity. 

As we mentioned in the Introduction, our approach is related to the work on Con- 
trolled Mobile Ambients (CMA) [17] and on Finite Control Mobile Ambients [3]. There 
are, however, important difference with respect to both approaches. 

In CMA the notions of process weight and capacity are entirely characterized at 
the typing level, and so are the mechanisms for resource control (additional control on 
ambient behavior is achieved by means of a three-way synchronization for mobility, but 
that is essentially orthogonal to the mechanisms targeted at resource control). In BoCa, 
instead, we characterize the notions of space and resources directly in the calculus, 
by means of an explicit process constructor, and associated capabilities. In particular, 
the primitives for transferring space, and more generally for the explicit manipulation 
of space and resources by means of spawning and replication appear to be original to 
BoCa, and suitable for the development of formal analyses of the fundamental mecha- 
nism of the usage and and consumption of resources which do not seem to be possible 
for CMA. 
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As to [3], their main goal is to isolate an expressive fragment of Mobile Ambients 
for which the model checking problem against the ambient logic can be made decidable. 
Decidability requires guarantees of finiteness which in turn raise boundedness concerns 
that are related to those we have investigated here. However, a more thorough com- 
parison between the two approaches deserves to be made and we leave it to our future 
work. 

Plans for future include further work in several directions. A finer typing discipline 
could be put in place to regulate the behavior of processes in the presence of primi- 
tive notions of named slots. Also, the calculus certainly needs behavioral theories and 
proof techniques adequate for reasoning about resource usage and consumption. Such 
theories and techniques could be assisted by enhanced typing systems providing static 
guarantees of a controlled, and bounded, use of resources, along the lines of the work 
by Hofmann and Jost in [10]. 

A further direction for future development is to consider a version of weighed am- 
bients whose “external” weight is independent of their “internal” weight, that is the 
weight of their contents. This approach sees an ambient as a packaging abstraction 
whose weight may have a different interpretation from that of contents’. For instance, 
modelling a wallet the weight of its contents could represent the value of the money 
Inside, whereas its external weight could measure the physical space it occupies. A di- 
rectory’s internal weight could be the cumulative size of its files, while the external 
weight their number. 

Last, but not least, we would like to identify logics for BoCa to formulate (quanti- 
tative) resource properties and analyses; and to model general resource bounds negoti- 
ation and enforcement in the Global Computing scenario. 

Acknowledgements 

We gratefully acknowledge the anonymous referees for careful reading and useful sug- 
gestions. 



References 

1. M. Bugliesi and G. Castagna. Secure safe ambients. In POPL'Ol, pages 222-235, New 
York, 2001. ACM Press. 

2. L. Cardelli and A. D. Gordon. Mobile ambients. Theoretical Computer Science, 2A0(\):\11- 
213, 2000. Special Issue on Coordination, D. Le Metayer Editor. 

3. W. Charatonik, A. D. Gordon, and J.-M. Talbot. Finite-control mobile ambients. In D. Le 
Metayer, editor, ESOP’02, volume 2305 of LNCS, pages 295-313, Berlin, 2002. Springer- 
Verlag. 

4. K. Crary and S. Weirich. Resource bound certification. In POPL'OO, pages 184-198, New 
York, 2000. ACM Press. 

5. J. C. Godskesen, T. Hildebrandt, and V. Sassone. A calculus of mobile resources. In L. Brim, 
P. Jancar, M. Kfetlnsky, and A. Kucera, editors, CONCUR'02, volume 2421 of LNCS, pages 
272-287, Berlin, 2002. Springer- Verlag. 

6. M. Hennessy, M. Merro, and J. Rathke. Towards a behavioural theory of access and mobility 
control in distributed system (extended abstract). In A. D. Gordon, editor, FOSSACS’03, 
volume 2620 of LNCS, pages 282-299, Berlin, 2003. Springer- Verlag. 



A Calculus of Bounded Capacities 



223 



7. M. Hennessy and J. Riely. Information flow vs. resource access in the asynchronous pi- 
calculus. ACM Transactions on Programming Languages and Systems, 24(5):566-591, 
2002 . 

8. M. Hennessy and J. Riely. Resource access control in systems of mobile agents. Information 
and Computation, 173:82-120, 2002. 

9. M. Hofmann. The strength of non size-increasing computation. In POPL’02, pages 260-269, 
New York, 2002. ACM Press. 

10. M. Hofmann and S. lost. Static prediction of heap space usage for first-order functional 
programs. In POPL’03, pages 185-197, New York, 2003. ACM Press. 

11. A. Igarashi and N. Kobayashi. Resource usage analysis. In POPL’02, pages 331-342, New 
York, 2002. ACM Press. 

12. F. Levi and D. Sangiorgi. Controlling interference in Ambients. In POPL’OO, pages 352-364. 
ACM Press, New York, 2000. 

13. R. Milner and D. Sangiorgi. Barbed bisimulation. In W.Kuich, editor, ICALP’92, volume 
623 of LACS, pages 685-695, Berlin, 1992. Springer- Verlag. 

14. R. D. Nicola, G. Ferrari, R. Pugliese, and B. Venneri. Types for access control. Theoretical 
Computer Science, 240(1):2 15-254, 2000. 

15. B. Pierce and D. Sangiorgi. Typing and subtyping for mobile processes. Mathematical 
Structures in Computer Science, 6(5):409-454, 1996. 

16. D. Sangiorgi and A. Valente. A distributed abstract machine for safe ambients. In F. Orejas, 
P. Spirakis, and J. Leeuwen, editors, ICALP’Ol, volume 2076 of LNCS, pages 408^20, 
Berlin, 2001. Springer- Verlag. 

17. D. Teller, P. Zimmer, and D. Hirschkoff. Using ambients to control resources. In L. Brim, 
P. Jancar, M. Kfetlnsky, and A. Kucera, editors, CONCUR’02, volume 2421 of LNCS, pages 
288-303, Berlin, 2002. Springer- Verlag. 

18. N. Yoshida and M. Hennessy. Subtyping and locality in distributed higher order mobile pro- 
cesses (extended abstract). In J. C. M. Baeten and S. Mauw, editors, CONCUR’99, volume 
1664 of LNCS, pages 557-573, Berlin, 1999. Springer- Verlag. 



Paradigm Regained: 

Abstraction Mechanisms for Access Control 



Mark S. Miller' and Jonathan S. Shapiro^ 



' Hewlett Packard Laboratories, Johns Hopkins University 
markmOcaplet . com 
^ Johns Hopkins University 
shap@cs . j hu . edu 



Abstract. Access control systems must be evaluated in part on how well they 
enable one to distribute the access rights needed for cooperation, while simulta- 
neously limiting the propagation of rights which would create vulnerabilities. 
Analysis to date implicitly assumes access is controlled only by manipulating a 
system's protection state - the arrangement of the access graph. Because of the 
limitations of this analysis, capability systems have been “proven” unable to en- 
force some basic policies: revocation, confinement, and the *-properties (ex- 
plained in the text). 

In actual practice, programmers build access abstractions - programs that help 
control access, extending the kinds of access control that can be expressed. 
Working in Dennis and van Horn's original capability model, we show how ab- 
stractions were used in actual capability systems to enforce the above policies. 
These simple, often tractable programs limited the rights of arbitrarily complex, 
untrusted programs. When analysis includes the possibility of access abstrac- 
tions, as it must, the original capability model is shown to be stronger than is 
commonly supposed. 



1 Introduction 

We live in a world of insecure computing. Viruses regularly roam our networks caus- 
ing damage. By exploiting a single bug in an ordinary server or application, an at- 
tacker may compromise a whole system. Bugs tend to grow with code size, so vulner- 
ability usually increases over the life of a given software system. Lacking a readily 
available solution, users have turned to the perpetual stopgaps of virus checkers and 
firewalls. These stopgaps cannot solve the problem - they provide the defender no 
fundamental advantage over the attacker. 

In large measure, these problems are failures of access control. All widely- 
deployed operating systems today - including Windows, UNIX variants, Macintosh, 
and PalmOS - routinely allow programs to execute with excessive and largely un- 
necessary authority. For example, when you run Solitaire, it needs only to render into 
its window, receive UI events, and perhaps save a game state to a file you specify. 
Under the Principle of Least Authority (POLA - closely related to the Principle of 
Least Privilege [Saltzer75]), it would be limited to exactly these rights. Instead, today, 
it runs with all of your authority. It can scan your email for interesting tidbits and sell 
them on eBay to the highest bidder; all the while playing only within the rules of your 
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system. Because applications are run with such excessive authority, they serve as 
powerful platforms from which viruses and human attackers penetrate systems and 
compromise data. The flaws exploited are not bugs in the usual sense. Each operating 
system is functioning as specified, and each specification is a valid embodiment of its 
access control paradigm. The flaws lie in the access control paradigm. 

By access control paradigm we mean an access control model plus a way of think- 
ing - a sense of what the model means, or could mean, to its practitioners, and of 
how its elements should be used. 

For purposes of analysis, we pick a frame of reference - a boundary between a 
base system (e.g., a “kernel” or “TCB”) creating the rules of permissible action, and 
programs running on that base, able to act only in permitted ways. In this paper, “pro- 
gram” refers only to programs running on the base, whose access is controlled by its 
rules. 

Whether to enable cooperation or to limit vulnerability, we care about authority 
rather than permissions. Permissions determine what actions an individual program 
may perform on objects it can directly access. Authority describes effects a program 
may cause on objects it can access, either directly by permission, or indirectly by 
permitted interactions with other programs. To understand authority, we must reason 
about the interaction of program behavior and the arrangement of permissions. While 
Dennis and van Horn's 1966 paper. Programming Semantics for Multiprogrammed 
Computations [Dennis66] clearly suggested both the need and a basis for a unified 
semantic view of permissions and program behavior, we are unaware of any formal 
analysis pursuing this approach in the security, programming language, or operating 
system literature. 

Over the last 30 years, the formal security literature has reasoned about bounds on 
authority exclusively from the evolution of state in protection graphs - the arrange- 
ment of permissions. This implicitly assumes all programs are hostile. While conser- 
vatively safe, this approach omits consideration of security enforcing programs. Like 
the access it controls, security policy emerges from the interaction between the behav- 
ior of programs and the underlying protection primitives. This omission has resulted 
in false negatives - mistaken infeasibility results - diverting attention from the possi- 
bility that an effective access control model has existed for 37 years. 

In this paper, we offer a new look at the original capability model proposed by 
Dennis and van Horn [Dennis66] - here called object-capabilities. Our emphasis - 
which was also their emphasis - is on expressing policy by using abstraction to ex- 
tend the expressiveness of object-capabilities. Using abstraction, object-capability 
practitioners have solved problems like revocation (withdrawing previously granted 
rights), overt confinement (cooperatively isolating an untrusted subsystem)', and the 
* -properties (enabling one-way communication between clearance levels). We show 
the logic of these solutions, using only functionality available in Dennis and van 
Horn's 1966 Supervisor, hereafter referred to as “DVH.” In the process, we show that 
many policies that have been “proven” impossible are in fact straightforward. 

The balance of this paper proceeds as follows. In “Terminology and Distinctions”, 
we explain our distinction between permission and authority, adapted from Bishop 
and Snyder's distinction between de jure and de facto transfer. In “How Much Au- 



* Semantic models, specifications, and correct programs deal only in overt causation. Since 
this paper examines only models, not implementations, we ignore covert and side channels. 
In this paper, except where noted, the “overt” qualifier should be assumed. 
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thority Does ‘cp’ Need?”, we use a pair of Unix shell examples to contrast two para- 
digms of access control. In “The Object-Capahility Paradigm”, we explain the rela- 
tionship between the object-capability paradigm and the object paradigm. We intro- 
duce the object-capability language E, which we use to show access control 
abstractions. In “Confinement”, we show how object- capability systems confine 
programs rather than uncontrolled subjects. We show how confinement enables a 
further pattern of abstraction, which we use to implement the *- properties. 



2 Terminology and Distinctions 



A direct access right to an object gives a subject permission to invoke the behavior of 
that object. Here, Alice has direct access to /etc/passwd, so she has permission to 
invoke any of its operations. She accesses the object, invoking its read ( ) operation. 

By subject we mean the finest-grain unit of computation on a given system that 
may be given distinct direct access rights. Depending on the system, this could be 
anything from: all processes run by a given user account, all processes running a 
given program, an individual process, all instances of a given class, or an individual 
instance. To encourage anthropomorphism we use human names for subjects. 



Alice * ^ /etc/passwd 

/ \ \ 

/ I \ \ 

subject direct aicess right access object 
client permission invocation provider 

Fig. 1. Access diagrams depict protection state. 

By object, we mean the finest-grain unit to which separate direct access rights may 
be provided, such as a file, a memory page, or another subject, depending on the sys- 
tem. Without loss of generality, we model restricted access to an object, such as read- 
only access to /etc/passwd, as simple access to another object whose behavior em- 
bodies the restriction, such as access to the read-only facet of /etc/passwd which 
responds only to queries. 

Any discussion of access must carefully distinguish between permission and au- 
thority (adapted from Bishop and Snyder’s distinction between de jure and de facto 
transfer [Bishop79]). Alice can directly read /etc/passwd by calling read(...) 
when the system's protection state says she has adequate permission. Bob (unshown), 
who does not have permission, can indirectly read /etc/passwd so long as Alice 
sends him copies of the text. When Alice and Bob arrange this relying only on the 
“legal” overt rules of the system, we say Alice is providing Bob with an indirect ac- 
cess right to read /etc/passwd, that she is acting as his proxy, and that Bob thereby 
has authority to read it. Bob’s authority derives from the arrangement of permissions 
(Alice's read permission, Alice’s permission to talk to Bob), and from the behavior of 
subjects and objects on permitted causal pathways (Alice's proxying behavior). The 
thin black arrows in our access diagrams depict permissions. We will explain the 
resulting authority relationships in the text. 
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The protection state of a system is the arrangement of permissions at some instant 
in time, i.e., the topology of the access graph. Whether Bob currently has permission 
to access /etc/passwd depends only on the current arrangements of permissions. 
Whether Bob eventually gains permission depends on this arrangement and on the 
state and behavior of all subjects and objects that might cause Bob to be granted per- 
mission. We cannot generally predict if Bob will gain this permission, but a conserva- 
tive bound can give us a reliable “no” or “maybe”. 

From a given system’s update rules - rules governing permission to alter permis- 
sions - one might be able to calculate a bound on possible future arrangements by 
reasoning only from the current arrangement. This corresponds to Bishop and Sny- 
der's potential de jure analysis, and gives us an arrangement-only bound on permis- 
sion. With more knowledge, one can set tighter bounds. By taking the state and be- 
havior of some subjects and objects into account, we may calculate a tighter partially 
behavioral bound on permission. 

Bob’s eventual authority to /etc/passwd depends on the arrangement of permis- 
sions, and on the state and behavior of all subjects and objects on permitted causal 
pathways between Bob and /etc/passwd. One can derive a bound on possible overt 
causality by reasoning only from the current arrangement of permissions. This corre- 
sponds to Bishop and Snyder's potential de facto analysis, and gives us an arrange- 
ment-only bound on authority. Likewise, by taking some state and behavior into ac- 
count, we may calculate a tighter partially behavioral bound on authority. 

Systems have many levels of abstraction. At any moment our frame of reference is 
a boundary between a base system that creates rules and the subjects hosted on that 
base, restricted to play by those rules. By definition, a base system manipulates only 
permissions. Subjects extend the expressiveness of a base system by building abstrac- 
tions whose behavior further limits the authority it provides to others. Taking this 
behavior into account, one can calculate usefully tighter bounds on authority. As our 
description ascends levels of abstraction [NeumannSO], the authority manipulated by 
the extensions of one level becomes the permissions manipulated by the primitives of 
the next higher base. Permission is relative to a frame of reference. Authority is in- 
variant. 

It is unclear whether Saltzer and Schroeder’s Principle of Least Privilege is best in- 
terpreted as least permission or least authority. As we will see, there is an enormous 
difference between the two. 



3 How Much Authority Does “cp” Need? 

Consider how the following Unix shell command works: 

$ cp foo.txt bar.txt 

Here, your shell passes to the cp program the two strings "foo.txt" and 
"bar.txt". By these strings, you mean particular files in your file system - your 
namespace of files. In order for cp to open the files you name, it must use your 



^ The Harrison Ruzzo Oilman paper [Harrison76] is often misunderstood to say this calcula- 
tion is never decidable. HRU actually says it is possible (indeed, depressingly easy) to design 
a set of update rules which are undecidable. At least three protection systems have been 
shown to be decidably safe [Jones76, ShapiroOO, MotwaniOO]. 
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namespace, and it must be able to read and write any file you might name that you 
can read and write. Not only does cp operate with all your authority, it must. Given 
this way of using names, cp's least authority still includes all of your authority to the 
file system. So long as we normally install and run applications in this manner, both 
security and reliability are hopeless. 

By contrast, consider: 

$ cat < foo.txt > bar.txt 

This shell command brings about the same end effect. Although cat also runs 
with all your authority, for this example at least, it does not need to. As with function 
calls in any lexically scoped language (even FORTRAN), the names used to designate 
arguments are evaluated in the caller’s namespace prior to the call (here, by opening 
files). The callee gets direct access to the first-class anonymous objects passed in, and 
designates them with parameter “names” bound in its own private name space (here, 
file descriptor numbers). The file descriptors are granted only to this individual proc- 
ess, so only this process can use these file descriptors to access these files. In this 
case, the two file descriptors passed in are all the authority cat needs to perform this 
request. 

Today's widely deployed systems use both styles of access control. They grant 
permission to open a file on a per-account basis, creating the pools of authority on 
which viruses grow. These same systems flexibly grant access to a file descriptor on a 
per-process basis. Ironically, only their support of the first style is explained as their 
access control system. Object-capability systems differ from current systems more by 
the elimination of the first style than by the elaboration of the second. 

If support for the first style were eliminated and cat ran with access only to the 
file descriptors passed in, it could still do its job, and we could more easily reason 
about our vulnerabilities to its malice or bugs. In our experience of object-capability 
programming, these radical reductions of authority and vulnerability mostly happen 
naturally. 



4 The Object- Capability Paradigm 

In the object model of computation [GoIdberg76, Hewitt73], there is no distinction 
between subjects and objects. A non-primitive object, or instance, is a combination of 
code and state, where state is a mutable collection of references to objects. The com- 
putational system is the dynamic reference graph of objects. Objects - behaving ac- 
cording to their code - interact by sending messages on references. Messages carry 
references as arguments, thereby changing the connectivity of the reference graph. 

The object-capability model uses the reference graph as the access graph, requiring 
that objects can interact only by sending messages on references. To get from objects 
to object- capabilities we need merely prohibit certain primitive abilities which are 
not part of the object model anyway, but which the object model by itself doesn't 
require us to prohibit - such as forged pointers, direct access to another's private state, 
and mutable static state [Kahn88, Rees96, MillerOO]. For example, C-H-, with its abil- 
ity to cast integers into pointers, is still within the object model but not the object- 
capability model. Smalltalk and Java fall outside the object-capability model because 
their mutable static variables enable objects to interact outside the reference graph. 
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Whereas the functionality of an object program depends only on the abilities pro- 
vided by its underlying system, the security of an object-capability program depends 
on underlying inabilities as well. In a graph of mutually suspicious objects, one ob- 
ject's correctness depends not only on what the rules of the game say it can do, but 
also on what the rules say its potential adversaries cannot do. 

4.1 The Object-Capability Model 

The following model is an idealization of various object languages and object- 
capability operating systems. All its access control abilities are present in DVH (Den- 
nis and van Horn's Supervisor) and most other object-capability systems^. Object- 
capability systems differ regarding concurrency control, storage management, equal- 
ity, typing, and the primitiveness of messages, so we avoid these issues in of our 
model. Our model does assume reusable references, so it may not fit object-capability 
systems based on concurrent logic/constraint programming [Miller87, Kahn88, 
Roy02]. However, our examples may easily be adapted to any object-capability sys- 
tem despite these differences. 

The static state of the reference graph is composed of the following elements. 

• An object is either a primitive or an instance. Later, we explain three kinds of 
primitives: data, devices, and a loader. Data is immutable. 

• An instance is a combination of code and state. We say it is an instance o/the 
behavior described by its code. For example, we say an operating system process is 
an instance of its program. 

• An instance's state is a mutable map from indexes to references. 

• A reference provides access to an object, indivisibly combining designation of the 
object, the permission to access it, and the means to access it. The permission ar- 
rows on our access diagrams now depict references. 

• A capability is a reference to non-data. 

• Code is some form of data (such as instructions) used to describe an instance's 
behavior to a loader, as explained below. Code also contains literal data. 

• Code describes how a receiving instance (or “self’) reacts to an incoming message. 

• While an instance is reacting, its addressable references are those in the incoming 
message, in the receiving instance's state, and in the literal data of the receiving in- 
stance's code. The directly accessible objects are those designated by addressable 
references. 

• An index is some form of data used by code to indicate which addressable refer- 
ence to use, or where in the receiving instance's state to store an addressable refer- 
ence. Depending on the system, an index into state may be an instance variable 
name or offset, a virtual memory address, or a capability-list index (a c-list index, 
like a file descriptor number). An index into a message may be an argument posi- 
tion, argument keyword, or parameter name. 



^ Our object-capability model is essentially the untyped lambda calculus with applicativeorder 
local side effects and a restricted form of eval - the model Actors and Scheme are based 
on. This correspondence of objects, lambda calculus, and capabilities was noticed several 
times by 1973 [Goldberg76, Hewitt73, Morris73], and investigated explicitly in [Tribble95, 
Rees96]. 
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Message passing and object creation dynamically change the graph's connectivity. 

In the initial conditions of Figure 2, Bob and Carol are directly accessible to Alice. 
When Alice sends Bob the message “foo (carol) ”, she is both accessing Bob and 
permitting Bob to access Carol. 

Alice can cause effects on the world outside herself only by sending messages to 
objects directly accessible to her (Bob), where she may include, at distinct argument 
indexes, references to objects directly accessible to her (Carol). We model a call- 
return pattern as two messages. For example, Alice gains information from Bob by 
causing Bob (with a query) to cause her to be informed (with a return). 

Bob is affected by the world outside himself only by the arrival of messages sent 
by those with access to him. On arrival, the arguments of the message (Carol) become 
directly accessible to Bob. Within the limits set by these rules, and by what Bob may 
feasibly know or compute. Bob reacts to an incoming message only according to his 
code. All computation happens only in reaction to messages. 

Alice says: bob.foo(carol) 

Alic^ ^foo ^ Bob 



Fig. 2. Introduction by Message Passing. 



Table 1. Object/Capability Corresponding Concepts. 



.Model Term 
instance 


Capability OS Terms 
process, domain 


Object Language Terms 
instance, closure 


code 


non-kcmcl program 
+ literal data 


lambda expression, 
class file, method table 


state 


address space + c-list 
(capability list) 


environment, 
instance variable frame 


index 


virtual memory address, 
c-list index 


lexical name, variable offset, 
argument position 


loader 


domain creator, exec 


eval, ClassLoadcr 



We distinguish three kinds of primitive objects. 

• Data objects, such as the number 3. Access to these are knowledge limited rather 
than permission limited. If Alice can figure out which integer she wants, whether 
3 or your private key, she can have it. Data provides only information, not access. 
Because data is immutable, we need not distinguish between a reference to data 
and the data itself. (In an OS context, we model user-mode compute instructions 
as data operations.) 

• Devices. For purposes of analysis we divide the world into a computational sys- 
tem containing all objects of potential interest, and an external world. On the 
boundary are primitive devices, causally connected to the external world by un- 
explained means. A non-device object can only affect the external world by send- 
ing a message to an accessible output device. A non-device object can only be af- 
fected by the external world by receiving a message from an input device that has 
access to it. 
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• A loader makes new instances. The creation request to a loader has two argu- 
ments: code describing the behavior of the new instance, and an index => refer- 
ence map providing all the instance's initial state. A loader must ensure (whether 
by code verification or hardware protection) that the instance's behavior cannot 
violate the rules of our model. A loader returns the only reference to the new in- 
stance. (Below, use a loader to model nested lambda evaluation.) 

By these rules, only connectivity begets connectivity - all access must derive from 
previous access. Two disjoint subgraphs cannot become connected as no one can 
introduce them. Arrangement-based analysis of bounds on permission proceeds by 
graph reachability arguments. Overt causation, carried only by messages, flows only 
along permitted pathways, so we may again use reachability arguments to reason 
about bounds on authority and causality. The transparency of garbage collection relies 
on such arguments. 

The object-capability model recognizes the security properties latent in the object 
model. All the restrictions above are consistent with good object programming prac- 
tice even when security is of no concern. 

4.2 A Taste of E 

To illustrate how the object-capability model is used to solve access control problems, 
we use a subset of the E language as our notation. This subset directly embodies our 
objectcapability model. All the functionality it provides is present in DVH. Full E 
extends the capability paradigm beyond the model presented above. Using a crypto- 
graphic capability protocol among mutually suspicious machines, E creates a distrib- 
uted persistent reference graph, supporting decentralized access control with some- 
what weaker properties than are possible within a single machine. These issues are 
beyond the scope of this paper. For the rest of this paper, “E” refers to our subset of 
E. 

In E, an instance is a single-parameter closure instantiated by lambda evaluation. 
The single (implicit) parameter is for the incoming message. A message send applies 
an objectas- closure to a message-as-argument. E combines Scheme-like semantics 
[Kelsey98] with syntax for message sending, method dispatch, and soft type check- 
ing, explained below. Here is a simple data abstraction. 

def pointMaker 

{ to make (x :int, y :int) : any { 

^ def point { 

to getX ( ) : int { } 

to getYO : int { } 

to add { other Pt) : any { 

^pointMaker . make (x . add ( otherPt . getX ( ) ) , 
y . add (otherPt .getY ( ) ) ) 

} } } } 

The expressions defining pointMaker and point are object definitions - both a 
lambda expression and a variable definition. An object definition evaluates to a clo- 
sure whose behavior is described by its body, and it defines a variable (shown in bold 
italics) to hold this value. The body consists of to clauses defining methods, and an 
optional match clause as we will see later. Because an object is always applied to a 
message, the message parameter is implicit, as is the dispatch on the message name to 
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select a method. The prefix acts like the return keyword of many languages. 
The pointMaker has a single method, make, that defines and returns a new point. 

Method definitions (shown in bold) and variable definitions (shown in italics) can 
have an optional soft type declaration [Cartwright91], shown as a followed by a 
guard expression. A guard determines which values may pass. The any guard used 
above allows any value to pass as is. 

The nested definition of point uses x and y freely. These are its instance vari- 
ables, and together form its state. The state maps from indexes "x" and "y" to the 
associated values from point’s creation context. 

Using the loader explained above, we can transform the above code to 

def pointMaker { 

to make (x :int, y :int) : any { 

^ def point := loader . load ( "def point 

["x" => X, "y" => y] ) 

} } 

Rather than a source string, a realistic loader would accept some form of separately 
compiled code. 

The expression ["x" => x, "y" => y] builds a map of index => reference as- 
sociations. All “linking” happens only by virtue of these associations - only connec- 
tivity begets connectivity. 

Applying this transformation recursively would unnest all object definitions. 
Nested object definitions better explain instantiation in object languages. The loader 
better explains process or domain creation in operating systems. In E, we almost al- 
ways use object definitions, but we use the loader below to achieve confinement. 

4.3 Revocation: Redell’s 1974 Caretaker Pattern 

Capability systems modeled as unforgeable references present the other ex- 
treme, where delegation is trivial, and revocation is infeasible. 

- Chander, Dean, Mitchell [ChanderOl] 

When Alice says bob. foo (carol) , she gives Bob unconditional, full, and perpetual 
access to Carol. Given the purpose of Alice's message to Bob, such access may dan- 
gerously exceed least authority. In order to practice POLA, Alice might need to 
somehow restrict the rights she grants to Bob. For example, she might want to ensure 
she can revoke access at a later time. But in a capability system, capabilities them- 
selves are the only representation of permission, and they provide only unconditional, 
full, perpetual access to the objects they designate. 

What is Alice to do? She can use (a slight simplification of) RedelTs Caretaker pat- 
tern for revoking access [Redell74], shown here using additional elements of E we 
explain below. 

def caretakerMaker { 

to make (var target) : any { 
def caretaker { 

match [verb iString, args :any[]] { 

E . call (target , verb, args) 

} } 

def revoker { 

to revoke ( ) :void { 
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target := null 

} } 

^ [caretaker, revoker] 

} } 

Instead of saying “bob . f oo ( carol ) ”, Alice can instead say: 

def [carol 2, carol2Rvkr] := caretakerMaker.make(carol) 

bob.foo(carol2) 

The Caretaker carol2 transparently forwards messages it receives to target’s 
current value. The Revoker carol2Rvkr changes what that current value is. Alice 
can later revoke the effect of her grant to Bob by saying “carol2Rvkr . revoke ( ) ”. 

Variables in E are non-assignable by default. The var keyword means target 
can be assigned, (var is the opposite of Java's final .) Within the scope of target’s 
definition, make defines two objects, caretaker and revoker, and returns them to 
its caller in a two element list. Alice receives this pair, defines carol 2 to be the new 
Caretaker, and defines carol2Rvkr to be the corresponding Revoker. Both objects 
use target freely, so they both share access to the same assignable target vari- 
able (which is therefore a separate object). 

What happens when Bob invokes carol 2, thinking he’s invoking the kind of thing 
Carol is? An object definition contains methods and an optional match clause defin- 
ing a matcher. If an incoming message (x.add(3)) doesn’t match any of the meth- 
ods, it is given to the matcher. The verb parameter is bound to the message name 
("add") and the args to the argument list ([3]). This allows messages to be re- 
ceived generically without prior knowledge of their API, much like Smalltalk's 
doesNotUnderstand : or Java’s Proxy. Messages are sent generically using 

“e . call (...) ”, much like Smalltalk’s perform;, Java’s “reflection”, or Scheme's 
apply. 

This Caretaker* provides a temporal restriction of authority. Similar patterns pro- 
vide other restrictions, such as filtering facets that let only certain messages through. 
Even in systems not designed to support access abstraction, many simple patterns 
happen naturally. Under Unix, Alice might provide a filtering facet as a process read- 
ing a socket Bob can write. The facet process would access Carol using Alice’s per- 
missions. 

4.4 Analysis and Blind Spots 

Given RedelTs existence proof in 1974, what are we to make of subsequent argu- 
ments that revocation is infeasible in capability systems? Of those who made this 
impossibility claim, as far as we are aware, none pointed to a flaw in Redell’s reason- 
ing. The key is the difference between permission and authority analysis. 
([ChanderOl] analyzes, in our terms, only permission.) By such an analysis. Bob was 
never given permission to access Carol, so there was no access to Carol to be re- 
voked! Bob was given permission to access carol2, and he still has it. No permis- 
sions were revoked. 



* The simple Caretaker shown here depends on Alice assuming that Carol will not provide 
Carol's clients with direct access to herself. 

See www.erights.org/elib/capability/deadman.html for a more general treatment of revoca- 
tion in E. 
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A security officer investigating an incident needs to know who has access to a 
compromised object. 

- Karger and Herbert [Karger84] 

In their paper, Karger and Herbert propose to give a security officer a list of all 
subjects who are, in our terms, permitted to access Carol. This list will not include 
Bob’s access to Carol, since this indirect access is represented only by the system’s 
protection state taken together with the behavior of objects playing by the rules. 
Within their system, Alice, by restricting the authority given to Bob as she should, has 
inadvertently thwarted the security officer’s ability to get a meaningful answer to his 
query. 

To render a permission-only analysis useless, a threat model need not include ei- 
ther malice or accident; it need only include subjects following security best prac- 
tices. 

An arrangement-only bound on permission or authority would include the possibil- 
ity of the Caretaker giving Bob direct access to Carol - precisely what the Caretaker 
was constructed not to do. Only by reasoning about behaviors can Alice see that the 
Caretaker is a “smart reference”. Just as pointMaker extends our vocabulary of data 
types, raising the abstraction level at which we express solutions, so does the Care- 
taker extend our vocabulary for expressing access control. Alice (or her programmer) 
should use arrangement-only analysis for reasoning about what potential adversaries 
may do. But Alice also interacts with many objects, like the Caretaker, because she 
has some confidence she understands their actual behavior. 



4.5 Access Abstraction 

The object-capability model does not describe access control as a separate concern, to 
be bolted on to computation organized by other means. Rather it is a model of modu- 
lar computation with no separate access control mechanisms. All its support for ac- 
cess control is well enough motivated by the pursuit of abstraction and modularity. 
Parnas’ principle of information hiding [Parnas72] in effect says our abstractions 
should hand out information only on a need to know basis. POLA simply adds that 
authority should be handed out only on a need to do basis [Crockford97]. Modularity 
and security each require both of these principles. 

The object-capability paradigm, in the air by 1967 [Wilkes79, Fabry74], and well 
established by 1973 [Redell74, Hewitt73, Morris73, Wulf74, WulfSl], adds the ob- 
servation that the abstraction mechanisms provided by the base model are not just for 
procedural, data, and control abstractions, but also for access abstractions, such as 
Redell’s Caretaker. (These are “communications abstractions” in [Tribble95]). 

Access abstraction is pervasive in actual capability practice, including filtering fac- 
ets, unprivileged transparent remote messaging systems [Donnelley76, Sansom86, 
Doorn96, MillerOO], reference monitors [Rajunas89], transfer, escrow, and trade of 
exclusive rights [Miller96, MillerOO], and recent patterns like the Powerbox [Wag- 
ner02, Stiegler02]. Further, every non-security-oriented abstraction that usefully en- 
capsulates its internal state provides, in effect, restricted authority to affect that inter- 
nal state, as mediated by the logic of the abstraction. 
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5 Confinement 



... a program can create a controlled environment within which another, pos- 
sibly untrustworthy program, can be run safely... call the first program a cus- 
tomer and the second a service. ... [the service] may leak, i.e. transmit ... the 
input data which the customer gives it. ... We will call the problem of con- 
straining a service [from leaking data ] the confinement problem. 

- Lampson [Lampson73] 

Once upon a time, in the days before wireless, you (a human customer) could buy a 
box containing a calculator (the service) from a manufacturer you might not trust. 
Although you might worry whether the calculations are correct, you can at least enter 
your financial data confident that the calculator cannot leak your secrets back to its 
manufacturer. How did the box solve the confinement problem? By letting you see 
that it comes with no strings attached. When the only causation to worry about would 
be carried by wires, the visible absence of wires emerging from the box - the isolation 
of the subgraph - is adequate evidence of confinement. 

Here, we use this same technique to achieve confinement, substituting capabilities 
for wires. The presentation here is a simplification of confinement in actual object- 
capability systems [Hardy86, Shapiro99, ShapiroOO, Wagner02, Yee03]. 

To solve confinement, assume that the manufacturer, Max, and customer, Cassie, 
have mutual access to a (Factory, factoryMaker) pair created by the following 
code. Assume that Cassie trusts that this pair of objects behaves according to this 
code. 

{ interface Factory guards FactoryStamp {...} 
def factoryMaker { 

to make (code : String) : Factory { 

^ de£ factory implements FactoryStamp { 
to new {state) : any { 

^loader . load (code , state) 



[Factory, factoryMaker] 

} 

The interface .. guards expression evaluates to a (trademark guard, stamp) 
pair representing a new trademark, similar in purpose to an interface type^. This syn- 
tax also defines variables to hold these objects, here named Factory and Fac- 
toryStamp. Here we use the FactoryStamp to mark instances of factory, and 
nothing else, as carrying this trademark. We use the Factory guard in soft type 
declarations, like Factory” above, to ensure that only objects carrying this trade- 
mark may pass. The block of code above evaluates to a (Factory, factoryMaker) 
pair. Only the factoryMaker of a pair can make objects, instances of factory, 
which will pass the Factory guard of that pair. 



^ Such trademarking can be implemented in DVH and in our model of object-capability com- 
putation [Morris73, Miller87, Tribble95, Rees96], so object-capability systems which pro- 
vide trademarking primitively [WulfSl, Hardy85, Shapiro99, Yee03] are still within our 
model. 
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Max uses a factoryMaker to package his proprietary calculator program in a 
box he sends it to Cassie. 

def calculatorFactory := f actoryMaker . make ( "...code..." ) 

cassie . accept Product ( calculatorFactory) 

In section 5.2 Cassie uses a Factory” declaration on the parameter of her ac- 
cept- Product method to ensure that she receives only an instance of the above 
factory definition. Inspection of the factory code shows that a factory's state con- 
tains only data (here, a String) and no capabilities - no access to the world outside 
itself. Cassie may therefore use the factory to make as many live calculators as she 
wants, confident that each calculator has only that access beyond itself that Cassie 
authorizes. They cannot even talk to each other unless Cassie allows them to. 

With lambda evaluation, a new subject’s code and state both come from the same 
parent. To solve the confinement problem, we combine code from Max with state 
from Cassie to give birth to a new calculator, and we enable Cassie to verify that she 
is the only stateproviding parent. This state is an example of Lampson’s “controlled 
environment”. To Cassie, the calculator is a controlled subject - one Cassie knows is 
born into an environment controlled by her. By contrast, should Max introduce Cassie 
to an already instantiated calculation service, Cassie would not be able to tell whether 
it has prior connectivity. (Extending our analogy, suppose Max offers the calculation 
service from his web site.) The calculation service would be an uncontrolled subject 
to her. 

We wish to reiterate that by “confinement”, we refer to the overt subset of 
Lampson's problem, where the customer accepts only code (“a program”) from the 
manufacturer and instantiates it in a controlled environment. We do not propose to 
confine information or authority given to uncontrolled subjects. 

5.1 A Non-discretionary Model 

Capabilities are normally thought to be discretionary, and to be unable to enforce 
confinement. Our confinement logic above relies on the non-discretionary nature of 
object-capabilities. What does it mean for an access control system to be discretion- 
ary? 

“Our discussion ... rested on an unstated assumption: the principal that cre- 
ates a file or other object in a computer system has unquestioned authority to 
authorize access to it by other principals. ...We may characterize this control 
pattern as discretionary . ” [emphasis in the original] 

- Saltzer and Schroeder [Saltzer75] 

Object-capability systems have no principals. A human user, together with his shell 
and “home directory” of references, participates, in effect, as just another subject. 
With the substitution of “subject” for “principal”, we will use this classic definition of 
“discretionary”. 

By this definition, object-capabilities are not discretionary. In our model, in DVH, 
and in most actual capability system implementations, even if Alice creates Carol, 
Alice may still only authorize Bob to access Carol if Alice has authority to access 
Bob. If capabilities were discretionary, they would indeed be unable to enforce con- 
finement. To illustrate the power of confinement, we use it below to enforce the *- 
properties. 
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5.2 The *-Properties 

Boebert made clear in [[Boebert84]] that an unmodified or classic capability 
system cannot enforce the *-property or solve the confinement problem. 

- Gong[Gong89] 

Briefly, the *-properties taken together allow subjects with lower (such as “secret”) 
clearance to communicate to subjects with higher (such as “top secret”) clearance, but 
prohibit communication in the reverse direction [Bell74]. KeySafe is a concrete and 
realistic design for enforcing the *-properties on KeyKOS, a pure object-capability 
system [Rajunas89]. However, claims that capabilities cannot enforce the *-properties 
continue [Gong89, Kain87, Wallach97, Saraswat03], citing [Boebert84] as their sup- 
port. Recently, referring to [Boebert84], Boebert writes: 

The paper ... remains, no more than an offhand remark. ... The historical sig- 
nificance of the paper is that it prompted the writing of [[Kain87]] 

- Boebert [Boebert03] 

Boebert here defers to Kain and Landwehr’s paper [Kain87]. Regarding object- 
capability systems, Kain and Landwehr’s paper makes essentially the same impossi- 
bility claims, which they support only by citing and summarizing Boebert. To lay this 
matter to rest, we show how Cassie solves Boebert's challenge problem - how she 
provides a one way comm channel to subjects she doesn't trust, say Q and Bond, who 
she considers to have secret and top secret clearance respectively. Can Cassie prevent 
Boebert's attack, in which Q and Bond use the rights Cassie provides to build a re- 
verse channel? 

Completing our earlier confinement example, Cassie accepts a calculator factory 
from Max using this method. 

to acceptProduct ( cal cFactory : Factory) :void { 

var diode : int := 0 

def diodeWriter { 

to write (val :int) :void { diode := val } 

} 

def diodeReader { 

to read ( ) ; int { ^diode } 

} 

def g := calcFactory . new ( [ "writeUp" => diodeWriter, ...] ) 

def bond := calcFactory . new ([ "readDown" => 

diodeReader, ...] ) 

} 

Cassie creates two calculators to serve as Q and Bond. She builds a data diode by 
defining a diodeWriter, a diodeReader, and an assignable diode variable 
they share. She gives Q and Bond access to each other only through the data diode. 
Applied to Cassie's arrangement, Boebert's attack starts by observing that Q can send 
a capability as an argument in a message to the diodeWriter. An arrangement- 
only analysis of bounds on permissions or authority supports Boebert’s case - the data 
diode might introduce this argument to Bond. Only by examining the behavior of the 
data diode can we see the tighter bounds it was built to enforce. It transmits data 
(here, integers) in only one direction and capabilities in neither. (Q cannot even read 
what he just wrote!) Cassie relies on the behavior of the factory and data diode ab- 
stractions to enforce the *-properties and prevent Boebert’s attack. (See [Miller03] for 
further detail.) 
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5.3 The Arena and Terms of Entry 

Policies like the *-properties are generally assumed to govern a computer system as a 
whole, to be enforced in collaboration with a human sys-admin or security officer. In 
a capability system, this is a matter of initial conditions. If the owner of the system 
wishes such a policy to govern the entire system, she can run such code when the 
system is first generated, and when new users join. But what happens after the big 
bang? Let’s say Alice meets Bob, who is an uncontrolled subject to her. Alice can still 
enforce “additive” policies on Bob, e.g., she can give him revocable access to Carol, 
and then revoke it. But she cannot enforce a policy on Bob that requires removing 
prior rights from Bob, for that would violate Bob’s security! 

Instead, as we see in the example above, acting as Lampson's “customer”, Aliceshe 
sets up an arena - Lampson’s “controlled environment” - with initial conditions she 
determines, governed by her rules, and over which she is the sys-admin. If her rules 
can be enforced on uncontrolled subjects, she can admit Bob onto her arena as a 
player. If her rules require the players not to have some rights, she must set terms of 
entry. “Please leave your cellphones at the door.” A prospective participant (Max) 
provides a player (calcFactory) to represent his interests within the arena, where 
this player can pass the security check at the gate (here, : Factory). No rights were 
taken away from anyone; participation was voluntary. 

The arena technique corresponds to meta-linguistic abstraction - an arena is a vir- 
tual machine built within a virtual machine [Abelson86, Safra86]. The resulting sys- 
tem can be described according to either level of abstraction - by the rules of the base 
level object-capability system or by the rules of the arena. The subjects built by the 
admitted factories are also subjects within the arena. At the base level, we would say 
Q has permission to send messages to diodeWriter and authority to send integers 
to Bond. At the arena level of description, we would say a data diode is a primitive 
part of the arena’s protection state, and say Q has permission to send integers to Bond. 
Any base level uncontrolled subjects admitted into the arena are devices of the arena 
- they have mysterious connections to the arena's external world. 

When the only inputs to a problem is data (here, code), any system capable of uni- 
versal computation can solve any solvable problem, so questions of absolute possibil- 
ity become useless for comparisons. Conventional language comparisons face the 
same dilemma, and language designers have learned to ask instead an engineering 
question; Is this a good machine on which to build other machines? How well did we 
do on Boebert's challenge? The code admitted was neither inspected nor transformed. 
Each arena level subject was also a base level subject. The behavior interposed by 
Cassie between the subjects was very thin. Mostly, we reused the security properties 
of the base level object-capability system to build the security properties of our new 
arena level machine. 



5.4 Mutually Suspicious Composition 

When mutually suspicious interests build a diversity of abstractions to express a di- 
versity of co-existing policies, how do these extensions interact? 

Let's say that Q builds a gizmo that might have bugs, so Q creates a Caretaker to 
give the gizmo revocable access to his diodeWriter. Q's policy relies on the be- 
havior of his Caretaker but not necessarily on Cassie's diodeWriter. To Cassie, 
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Q's gizmo and Caretaker are part of Q's subgraph and indistinguishable from Q. 
Cassie's policy relies on the behavior of her diodeWriter, but not on Q's Care- 
taker. They each do a partially behavioral analysis over the same graph, each from 
their own subjective perspective. This scenario shows how diverse expressions of 
policy often compose correctly even when none of the interested parties are aware this 
is happening. 



6 Conclusion 

Just as we should not expect a base programming language to provide us all the data 
types we need for computation, we should not expect a base access control system to 
provide us all the elements we need to express our protection policies. Both issues 
deserve the same kind of answer: We use the base to build abstractions, extending the 
vocabulary we use to express our solutions. In evaluating a protection model, one 
must examine how well it supports the extension of its own expressiveness hy ab- 
straction and composition. 

Security in computational systems emerges from the interaction between primitive 
protection mechanisms and the behavior of security enforcing programs. As we have 
shown here, such programs are able to enforce restrictions on more general, untrusted 
programs by building on and abstracting more primitive protection mechanisms. To 
our knowledge, the object-capability model is the only protection model whose se- 
mantics can be readily expressed in programming language terms: approximately, 
lambda calculus with local side effects. This provides the necessary common seman- 
tic framework for reasoning about permission and program behavior together. Be- 
cause security-enforcing programs are often simple, the required program analysis 
should frequently prove tractable, provided they are built on effective primitives. 

By recognizing that program behavior can contribute towards access control, a lost 
paradigm for protection - abstraction - is restored to us, and a semantic basis for 
extensible protection is established. Diverse interests can each build abstractions to 
express their policies regarding new object types, new applications, new require- 
ments, and each other, and these policies can co-exist and interact. This extensibility 
is well outside the scope of traditional access graph analyses. 

Analyses based on the evolution of protection state are conservative approxima- 
tions. A successful verification demonstrating the enforcement of a policy using only 
the protection graph (as in [ShapiroOO]) is robust, in the sense that it does not rely on 
the cooperative behavior of programs. Verification /ai/wrej are not robust - they may 
indicate a failure in the protection model, but they can also result from what might be 
called “failures of conservatism” - failures in which the policy is enforceable but the 
verification model has been simplified in a way that prevents successful verification. 

We have shown by example how object-capability practitioners set tight bounds on 
authority by building abstractions and reasoning about their behavior, using concep- 
tual tools similar to that used by object programmers to reason about any abstraction. 
We have shown, using only techniques easily implementable in Dennis and van 
Horn's 1966 Supervisor, how actual object-capability systems have used abstraction 
to solve problems that analyses using only protection state have “proven” impossible 
for capabilities. 
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The object-capability paradigm, with its pervasive, fine-grained, and extensible 
support for the principle of least authority, enables mutually suspicious interests to 
cooperate more intimately while being less vulnerable to each other. When more 
cooperation may be practiced with less vulnerability, we may find we have a more 
cooperative world. 
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Abstract. The paper presents a modular design of a distribution mid- 
dleware that supports the wide variety of entities that exist in high level 
languages. Such entities are classified into mutables, immutables and 
transients. The design is factorized in order to allow multiple consistency 
protocols for the same entity type, and multiple coordination strategies 
for implementing the protocols that differ in their failure behavior. The 
design is implemented and evaluated. It shows a very competitive per- 
formance. 



1 Introduction 

We present the design and implementation of a middleware library, the DSS 
(Distribution SubSystem). The DSS is designed to simplify the implementation 
of distributed programming systems. Using the DSS we can add distributed pro- 
gramming facilities to programming systems/languages which are normally not 
distributed. We claim that the effort needed to create a distributed programming 
system(DPS) using the DSS to handle distribution involves much less program- 
ming effort than to explicitly program the necessary distribution support in the 
system itself. 

Our system aims at completeness; by this we mean that different paradigms of 
distributed computing can easily be implemented using the DSS. By complete- 
ness we include both functional (e.g. extending objects to distributed objects 
with preserved semantics) as well as non-functional aspects (e.g. providing the 
most efficient distribution support for all patterns of use of distributed objects) . 

1.1 Background 

Middleware support for programming language distribution - be it partial or 
total - can conceptually be divided into two categories: programming language 
dependent middleware and programming language independent middleware. In- 
dependent middleware commonly targets language interoperability and offers 
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only limited distribution support, e.g. CORBA [1]. Language dependent mid- 
dleware can potentially offer complete distribution support. However the design 
and implementation is time-consuming. In practice, the amount of actual distri- 
bution support in a distributed programming system (DPS) reflects the trade-off 
between desired completeness and the amount of work required to realize it. 

The trade-off is, we believe, reflected in the fact that most DPSs are incom- 
plete even as regards functional aspects, in that distribution does not preserve 
desirable properties of the language semantics [2, 3, 4, 5, 6]. Few systems are func- 
tionally complete [7,8,9], but none, to our knowledge, is complete as regards 
efficiency aspects. 



1.2 Motivation 

This work is motivated by the need of a language independent middleware that 
provides full distribution support for arbitrary high-level programming system 
(PS). By full support we mean that all language entities should potentially be 
sharable in a distributed computing environment, with preserved semantics. Dis- 
tribution on the level of language entities means that threads residing in different 
processes can share entities as if they were residing in the same process^. 

Examples of language entities are first class data structures such as objects, 
primitive data types or channel abstractions. Note that we exclude unsafe data 
types such as C pointers. On the other hand, code can be shared (e.g. proce- 
dure values in Oz and classes in Java). Furthermore some data types will be 
shared with limited distribution behavior; files, for instance, could be shared as 
stationary objects which only allow remote access. 

Sequential consistency is generally a requirement [10] to preserve the seman- 
tics of many language entities. A prototypical example is the semantics of objects 
in 00 languages. However, this does not preclude the use of a weaker consis- 
tency model to improve the performance of a distributed application [11], but 
from our point-of-view this should be reflected in a different type of language 
entity (e.g. different type of object). 



1.3 Contributions 

The major contributions of this paper can be summarized as follows. Firstly, for 
the developer of a DPS, we provide a model of distribution support for language 
entities, based on the type of distribution support a given entity requires. The 
model is general enough to support all to us known language entities, found in al- 
most all high-level programming languages/systems (e.g. Java, C# and Oz[12]). 

Secondly, for the application developer, we provide a model of distribution 
that guarantees functional properties (i.e. preserving consistency) for a given dis- 
tributed entity. This model also allows for fine-grained control of non-functional 
aspects. Assignment of entity consistency protocols can be done in runtime, 
based on expected pattern of use per entity instance, and not on the entity type. 

^ This means location transparency modulo failure and latency. 
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Thirdly, we describe a novel component-based design of entity consistency 
protocols. The model simplifies implementation of new protocols, increases code 
reuse, and enables fine-grained customization of entity consistency protocols 
from the DPS level. 

Finally, we present the implementation and evaluation of our language in- 
dependent middleware library, called the Distribution SubSystem^ (DSS). The 
DSS efficiently implements the above-described contributions, as shown by our 
evaluation. 



1.4 Paper Organization 

The rest of the paper is organized as follows. In Sect. 2 the language independent 
entity model is described. Sect. 3 describes our novel structure of entity consis- 
tency protocols. The structure and actual implementation of our middleware is 
briefly discussed in Sect. 4. More attention is given to the performance of our 
implementation as shown in Sect. 5. The design is compared to similar systems 
in Sect. 6 and a conclusion is given in Sect. 7. Note that this paper focuses on 
the design of key concepts and evaluation of our middleware library, and not on 
the philosophy behind the design and practical issues such as how a PS can be 
coupled to the library. Those issues are explained in detail in [13]. 

2 An Abstract Model of Language Entities 

The set of language entities found in most high-level programming languages is 
large. These entities are from a programming point of view semantically different, 
even though they might have the same name. However, from the distribution 
point of view, those differences can, to a large degree, be abstracted out and we 
are left with surprisingly few abstract entity types. The proposed model provides 
distribution on the level of abstract entities. 



2.1 The Abstract Entity 

Our shared entity model uses the notion of a local entity instance, acting as 
the local representative for a shared entity. A local entity instance is present at 
every process holding a reference to the shared entity. All instances are inherently 
equal, none is more privileged, i.e. there is no a priori centralized control. Each 
instance is connected to an abstract entity instance, coordinating operations 
performed by threads on the local instances. 

When a local entity instance becomes shared, it may not be accessed di- 
rectly anymore; operations must be directed to its abstract entity instance. All 
interaction with an abstract entity instance is done using abstract operations^ , 
expressing manipulations of the shared entity. An entity operation is translated 

^ Available for download at http://dss.sics.se. 

® Analogous to the distinction between abstract and concrete language entities there 
are potentially many concrete operations per abstract operation. 
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into an abstract operation, expressing a corresponding semantic type of ma- 
nipulation. The result of an abstract operation tells the calling thread how to 
proceed: perform the operation on the local instance, continue with the next 
instruction or wait for a later decision. 

At any point in time a local entity instance is either complete, i.e. it has a 
representation that allows for local execution of operations, or skeleton, i.e. it 
merely acts as a proxy. The status is explicitly controlled by the abstract entity 
instance. 

Entity types that are to be distributed must be matched with a suitable 
abstract entity type. The matching is based on the centralized semantics of the 
entity type. Different abstract entity types capture different functional needs and 
guarantee consistency according to a consistency model (e.g. sequential consis- 
tency). An abstract entity instance actually provides a single interface to a set 
of entity consistency protocols with the same functional properties. In order to 
support distribution, at least one entity consistency protocol is needed per ab- 
stract entity type. However, to efficiently capture non-functional requirements, 
multiple entity consistency protocols are required. A non- functional requirement 
might be maximum number of hops, bandwidth utilization, or resilience to fail- 
ures. 

2.2 Different Types of Abstract Entities 

We have currently identified three meaningful abstract entity types, all guaran- 
teeing sequential consistency. 

Mutable. This type has two abstract operations. Update indicates that the 
state is to be altered while access means to read. The mutable is preferably 
used by language entities that allows for destructive updates, e.g. objects. 
Suitable protocols for this type are: remote-execution, mobile state [14], and 
read/write invalidation. 

Immutable. The immutable state of an entity is at some point replicated to 
a processes referring it. It can then be accessed through access. This means 
that all entity instances eventually become complete and no synchronization 
is then needed. Protocols for the immutable are eager-, lazy- and immediate 
replication. 

Transient. This type has two abstract operations: access and bind. Bind ter- 
minates the coordination of the entity, thus removing all abstract entity 
instances. Access suspends the caller until a bind operation has been per- 
formed. The transient is preferably used for languages entities such as logical 
variables in Oz[15], and futures in Multi lisp[16j. 

Not all language entities guarantee sequential consistency in the centralized 
case. Oz ports and Erlang channels are examples of such entities. Also, asyn- 
chronous remote method invocation[17] is a popular optimization in distributed 
object systems. To efficiently support distribution of this class of entities, we 
provide two abstract entity types that guarantee at most PRAM (or FIFO) [18] 
consistency, called Relaxed Mutable and Relaxed Transient. 
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Fig. 1. Sequence diagram of the state transporting protocol, e.g. mobile state protocol. 



2.3 Interacting with an Abstract Entity 

A language entity interacts with its abstract entity using abstract operations. In 
order to resolve operations an abstract entity needs to interact with its entity 
instance, using four entity-instance callbacks: 

retrieveState. The callback returns a state description of the entity instance, 
i.e. a description that can change any entity instance’s status from skeleton 
to complete. Clearly this is only legitimate if the entity instance from which 
the description is retrieved is complete. 

installState. Install a state description to the entity instance, making it com- 
plete. 

executeOperation. The callback is given a description of a concrete entity 
operation to execute on the local entity instance. 
resumeThread. A thread previously suspended on an abstract operation is 
resumed. The thread is told to either redo the operation or continue with 
the next instruction. 

The described interaction framework^ is complete in the sense that either a 
state- or an operation-transporting protocol can be used transparently. A state 
transporting protocol allows for local access for the entity instances by moving a 
state description to the executing process. In contrast, an operation transporting 
protocol moves an operation description to a process(es) hosting a complete 
instance to execute it there. 

A shared object uses the mutable abstract entity. Depending on the chosen 
type of protocol, different events can be observed for the same abstract operation. 
Bellow are two examples where an object is distributed using either a state or 
an operation transporting protocol. The sequence diagrams show the respective 
events for the same method invocation on the shared entity. Note that in both 
cases the initiating entity instance has skeleton status. 

Example: State Transporting Protocol. Fig. 1 depicts the sequence of 
events that occurs when a thread at process A performs a method invocation 

Due to space limitations the interfaces are described on a conceptual level. Concrete 
API descriptions and code examples can be found in [13]. 
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9 - resume thread 
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time 



5 - executeOperation 



7 - operation resolved 



6 - entity operation 



Fig. 2. Sequence diagram of the operation transporting protocol, e.g. remote execution 
protocol. 



(1) on a shared object, whose state is located at process B. The method invoca- 
tion is translated into an abstract operation and passed on (2) to the abstract 
entity. A request for a state description is sent^ (3) to the abstract entity lo- 
cated at process B. Simultaneously the thread is told to suspend itself (4). The 
abstract entity at process B receives the state request (5) and uses the callback 
retrieveState to get a state description. The description is passed back (6) to 
the abstract entity at process A, where it is installed (7) using the installState 
callback. Finally the suspended thread is resumed (8), using resumeThread, 
and told to redo the operation (9). 

Example: Operation Transporting Protocol. Fig. 2 depicts the sequence of 
events that occurs when a thread at process A performs a method invocation (1) 
on a shared object, whose state is located at process B. The method invocation is 
translated into an abstract operation and passed on (2) to the abstract entity. A 
description of the operation is sent (3) to process B. Simultaneously the thread 
is told to suspend itself (4). The abstract entity at process B receives (5) the 
operation description and uses the callback executeOperation to perform the 
operation locally. The operation is executed by a dedicated thread® (6) and the 
result is returned to the abstract entity (7). The result is passed back to the 
abstract entity at process B (8), that passes the result to the suspended thread 
and resumes it (9). 



2.4 Classifying Language Entities into Abstract Entities 

Assigning abstract entities to language entities can preserve the structuring im- 
posed by the language, but is not required to. Multiple language entities can 
be distributed together as one abstract entity, and one language entity can be 
decomposed into multiple abstract entities. Note that regardless of the chosen 
structuring, an abstract entity must capture the semantics for the structure in 
the centralized case, e.g. an entity that allows for destructive updates should not 
use the immutable type. 

® Via the coordination network, to be described later. 

® A thread created solely for the purpose of remote execution. 



The Design and Evaluation of a Middleware Library 249 



For example, an array can be treated as one (composed) mutable entity or it 
can be decomposed into an immutable structure referring mutable cells. When 
distributed, the array structure will be replicated and the array structure in turn 
will refer mutable cell instances. Thus when updating an array element only one 
cell will be affected. 

3 Coordinating Entity Instances 

A coordination proxy is present at each process hosting a local entity instance. 
The coordination proxy is connected to its abstract entity (as depicted at the 
left of Fig. 3). 

An entity consistency protocol executes over the dynamic sub-network, called 
the coordination network'^ , formed by participating coordination proxies. A co- 
ordination network has a hub, called the coordinator. A property of the network 
is that proxies can always contact the coordinator, while the vice verca is not 
necessarily true. For the remote execution protocol, a proxy would send the op- 
erations to the coordinator, that in turn passes them to the proxy where the 
state is located. 




Fig. 3. The coordination network and the per-process coordination proxy. The abstract 
entity is coupled to a coordinating proxy connecting it to the coordination network. 
Bellow to the left is the expanded framework of the entity consistency protocol. Note 
that in this example the coordinator is located at one of the processes hosting a coor- 
dination proxy. 



3.1 Three Dimensions of Entity Consistency 

The entity consistency protocol is realized as a framework, as shown in the ex- 
pansion of the Coordination Proxy in Fig. 3. This divides the strategy over entity 
consistency into three separate sub-strategies. Firstly, the memory management 
strategy detects when a shared entity is no longer needed, i.e. the number of 
proxies reaches one. Secondly, addressing within the coordination network is 
realized by a coordination strategy, also providing a messaging service for the 

To minimize this subnet, at-most-one coordination proxy is allowed per process. 
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other two modules. Finally the consistency strategy upholds the entity semantics, 
controlling entity instances connected to the coordination network and threads 
performing operations. The three strategies are implemented as three protocols 
running in parallel over the coordination network. 

Each strategy is implemented as a module with a well-specified interface, 
one interface per sub-strategy type. An optimal entity consistency protocol for a 
particular entity and usage pattern is just a matter of composition. This poten- 
tially increases code reuse (in the form of reused sub-strategies) and simplifies 
development of entity consistency protocols. 

Coordination Strategy. The coordination strategy defines how the messaging 
infrastructure and services are realized. These messaging services are then used 
by the consistency- and the memory management strategy. This includes defining 
the location and behavior of the coordinator and providing routines for inter 
coordination-network communication. Examples of coordination strategies are: 
a stationary coordinator, a mobile coordinator and replicated coordinators. 

Consistency Strategy. The protocol resolves abstract operations for the ab- 
stract entity. A consistency strategy is divided into two parts: one end-point unit, 
present at each coordination proxy, and one arbitrator, located at the coordina- 
tor(s). 

Interaction with local entity instances together with communication and ad- 
dressing services, provided by the coordination strategy, simplifies implementa- 
tion of a wide range of protocols: 

Remote-Execution. Every end-point sends all operations to the arbitrator 
that sends them on to one selected end-point unit, hosting the complete lo- 
cal entity instance. A synchronous version of the protocol is available to the 
mutable abstract entity, where the write operation is acknowledged (possi- 
bly with a result). An asynchronous version exists for the relaxed mutable 
abstract entity. 

Mobile State. The entity’s state is moved between local entity instances. Re- 
gardless of the type of operation, an end-point requests the state from the 
arbitrator and waits until it arrives. Several consecutive writes and reads can 
then be performed locally. 

Read/Write-Invalidation. A protocol that allows for exclusive update or con- 
current access to a local state. Two versions of the protocol exist. The eager 
protocol records the readers and automatically updates them when the state 
has changed. The lazy protocol requires all readers to actively request read 
permission after an invalidation. 

Pilgrim. A mobile state protocol inspired by the work in [19]. This protocol is 
optimized for the case when a small set of proxies reads and writes frequently. 
Replication. This class of protocols is used by immutables. For lazy replication, 
the state is retrieved when an entity instance first tries to access the state. 
For eager replication it is requested immediately after the creation of the 
coordination proxy. Immediate replication transports the state with every 
reference to an entity, but duplicates are avoided due to the at-most-once 
property of coordination proxies. 
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Fig. 4. The relation between a generic AAPM, a transporting CSC and the underlying 
communication system, e.g. the OS. The CSC is free to use whatever means desired to 
transport data. 



Once Only. A class of protocols used to realize transient behavior, inspired in 

their design by [15]. 

Memory Management Strategy. Properly packaged distributed garbage col- 
lection algorithms [18] detect when the number of coordination proxies reaches 
one. When this occurs the last entity instance can be localized, hence dismantling 
the coordinator and freeing resources. Of course by the time when localization 
is achieved there may be no references left, which usually will be handed by the 
memory management outside the DSS. 

Similar to the consistency strategy, the memory management strategy is di- 
vided into end-point units and one detector (located at the coordinator). The 
existence of an end-point at a coordination proxy guarantees that the proxy 
is accounted for by the detector. Using this framework the DSS implements: 
fractional weighted reference counting[20] , reference listing[21], time lease and 
persistent entities. 

4 Middleware Design and Implementation 

To simplify the coupling of a PS to the DSS, the DSS is internally divided in 
two subcomponents: the Advanced Asynchronous Protocol Machine(AAPM), 
and the Communication Service Component (CSC). As depicted in Fig. 4 the 
AAPM implements the abstract entities, protocols of the coordination network 
and a high level messaging service. The messaging service is based on a notion of 
remote processes, DSites, providing reliable, in order, asynchronous messaging. 

The CSC is an interface for communication routines and networking tasks 
such as connection establishment, data transportation and failure detection for 
the messaging service. The purpose of the CSC is to abstract away the OS 
from the AAPM, as depicted in Fig. 4. The CSC is easily replaced to enable cus- 
tom implementations based on application knowledge, e.g. specialized addressing 
schemes or failure detectors. 
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Our implementation of the DSS is a linkable C++ library and our CSC imple- 
mentation is a fairly straight forward C++ implementation based upon TCP/IP. 
The DSS uses interface classes, mediators, for all complex data structures shared 
with the PS: entity references, entity states, operations and threads. Mediators 
are represented as C++ classes to simplify coupling. Marshaling of mediators is 
a cooperative activity involving both the DSS and the PS. Our model allows for 
late marshaling, i.e. messages are serialized when actually put onto the wire and 
not when inserted into the messaging service. The DSS knows how to serialize 
its internal data. When messages contain mediators, the programming system is 
asked to create a serialized representation. 

5 Evaluation 

In order to evaluate the performance of the DSS we have conducted three ma- 
jor tests. Firstly, a test that measures pure messaging speed. Secondly, a test 
evaluating the performance of the different consistency strategies, executed in a 
controlled environment. Finally, an impact evaluation of using different consis- 
tency strategies, in a real world application. 

To evaluate the DSS we used four different applications/systems: 

Socket- Application. A small C++ socket application which measures the raw 
I/O cost for messaging on our cluster. 

C++DSS. A thin C++ library on top of the DSS that allows for sharing of 
mutable and transient data structures. This is included to measure the raw 
cost of abstract operations. 

Mozart. Development version 1.3.0 of the distributed programming system that 
implements the multi-paradigm language Oz. Included to measure the differ- 
ence between the tightly integrated distribution layer in Mozart compared 
to a pure Oz virtual machine coupled to the DSS. 

Oz-DSS. Development version 1.3.0 of the Mozart system with its internal dis- 
tribution support replaced by the DSS. The Oz-DSS system is far more ex- 
pressive when it comes to distribution than the original Mozart system and it 
clearly separates the local-execution engine from the distribution subsystem. 

All applications were compiled with gcc 2.95.2 using standard optimizations. 
The tests were conducted on a cluster of AMD ATHLON XP 1900 workstations, 
equipped with 512 MB of memory, interconnected by a 100Mbit LAN. The 
workstations run standard Red Hat Linux version 7.3 without X windows. 



5.1 Messaging Performance 

The test measured the time to perform 10000 sequential remote requests with 
one server and varying number (1-15) of simultaneously running client processes. 
The test was conducted with all four applications. The right diagram of Fig. 5 
shows the result for the different applications running with one single client. The 
times are normalized to the socket application. 
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Fig. 5. The time to perform 10000 sequential remote requests for the four systems. 
The left graph shows the result when the number of nodes ranges from 1 to 15. The 
right graph show the result for one node, normalized against the socket application. 



The C++-DSS application has only a 50% overhead compared to the raw 
socket program. This is a surprisingly small difference considering the differences 
in functionality. The socket program is extremely optimized for the test while 
the C++DSS is a generic distribution platform. 

The tightly integrated distribution of Mozart gives 12% better performance 
over the Oz-DSS system. However, in the light of increased functionality and 
superior extendibility, this small difference is certainly acceptable. 

The left diagram of Fig. 5 shows the total time to conduct the test when 
the number of clients increases. It is interesting to note that all DPSs increases 
the time proportionally to the socket application. This indicates, at least within 
the interval of 1 to 15 nodes, that the I/O capacity of the underlying operating 
system is the dominant factor when communicating with multiple nodes. 

5.2 Protocol Evaluation 

In this section, five entity consistency protocols for the mutable abstract entity 
are compared using the Oz-DSS. Factors like the number of participating pro- 
cesses, number of threads per process and the ratio between reads and writes 
were altered in order to show how the protocols performed under different usage 
patterns. 



The Impact of Concurrency. Each process performed 10000 accesses to a 
shared state under varying degree of concurrency; i.e. one thread doing 10000 
iterations, two threads doing 5000 iterations, and up to 100 threads doing 100 
iterations each. The test was conducted with 12 clients, using the mobile state, 
pilgrim and stationary protocols. The left diagram of Fig. 6 shows the plot of the 
total time for all clients to conduct all iterations against the number of threads 
per process. Note that increased concurrency has two effects. Firstly, latency will 
be masked which is notable in all tests. Secondly, concurrency will also batch 
pending operations when threads wait for the mobile state. This is observable; 
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Fig. 6. The total time for 12 Oz-DSS processes to conduct 10000 accesses to a mutable 
state, using three different protocols, under different degree of concurrency. In the left 
graph the mutable state is a single integer value and in the right a list of 1000 integer 
values. 




Fig. 7. Left: the total system and user time for the server process when 12 Oz-DSS 
processes does 10000 state accesses, using three different protocols, under different 
degrees of concurrency. Right: The total time for 15 Oz-DSS processes to perform 
sequential 10000 accesses, using three different protocols, under different read to write 
ratios. 



the state moving protocols advance from being outperformed to outperforming 
the stationary protocol. When the size of the state increases, from one single 
integer value to a list of 1000 integer values, the cost for I/O increases, depicted 
in the right diagram of Fig. 6. Since the state moving protocols communicate 
less when the degree of concurrency is increased, they perform better than the 
stationary protocol. 

When considering the work load of the server, measured as the sum of user 
and system time used by the server process, a slightly different picture emerges, 
(see the left diagram of Fig. 7). To start with, the pilgrim protocol does not 
use the server at all. This is due to the long sequences of accesses of the state. 
Furthermore, the stationary protocol requires almost the same amount of re- 
sources as the mobile state protocol when the number of threads is small and 
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is quickly outperformed when concurrency is increased. This shows the strength 
of the state moving protocols in batching work and low utilization of the server 
process. It also shows their weakness: higher access time when the number of 
competing processes for the state grows large. 

Utilizing the Read/Write Ratio. The two cache invalidation protocols, with 
lazy and eager updates, allows the state of a shared entity to be read in parallel, 
but updated at only one process at a time. This caters for substantial reduction 
in message traffic when the ratio of reads vs. writes is high. To validate this we 
conducted a test to see at which point our invalidation protocols are better than 
the simple stationary protocol. The test was conducted with 15 client processes, 
each performing 10000 sequential accesses to a shared state. The proportion 
of operations that were reads ranged from 0 to 99%, the resulting graph is 
shown in the right diagram of Fig. 7. The test shows that invalidation, with lazy 
updates, is overall superior to eager updating. The invalidation protocols impose 
a notable overhead when the access pattern has a low read to write ratio, but 
both protocols also improve in performance notably when the ratio gets high 
(above 85% for lazy and 95% for eager). 

As a proof of concept, we took an already existing application developed for 
the Mozart platform, added the possibility to annotate the distribution behavior 
of single entities and tested it on our Oz-DSS system. The application, a dis- 
tributed version of the snake game with self-learning actors, is interesting from 
a distribution point of view. Each actor, one per node (process), reads a section 
of a shared matrix and decides how to do a move, i.e. update an element in the 
matrix. The matrix is distributed on the level of single matrix elements. 

The tests were conducted with the matrix elements distributed using different 
protocols(for mutables). The number of nodes was altered in order to show the 
scalability for different consistency strategies(i.e. protocol choice). 

The ratio between reads to writes is high in the application. As shown in the 
left diagram of Fig. 8, the two invalidation protocols do very well, while both 
the migratory and stationary protocols simply do not scale. 

5.3 Distributing a Real Application 

By varying the size of the matrix, the interaction between the processes is implic- 
itly varied. For a smaller matrix, the chance that two processes will read the same 
element increases, and for a larger matrix it decreases. The consequences are de- 
picted in the right diagram of Fig. 8; the eager invalidation protocol performs 
better than the lazy protocol on a small matrix and vice versa. It is beneficial 
to distribute information on element update immediately and not wait until it 
is asked for, if the chance that another processes will read the element is high. 

6 Related Work 

We relate our work to a wide range of middleware systems, both of the language 
dependent and the language independent type. 
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Fig. 8. Total time to run the distributed snake game for 1000 iterations, with varying 
number of clients. The left graph shows the result for all four protocols. The right 
graph shows the result for the two invalidation protocols. 



CORBA [1] is an example of programming language independent middleware 
focusing on interoperability. CORBA requires data to be structured into objects 
and interaction between objects is solely achieved through method shipping. 
The DSS differs from CORBA in that no structuring is enforced, instead the 
natural structuring of the programming language can be mapped to appropriate 
distribution support. The DSS supports objects(mutables) in addition to many 
other abstract entity types with a multiplicity of entity consistency protocols, 
method shipping being only one choice for mutables. 

InterWeave [22] is limited to distributing data on the level of abstract mem- 
ory pages. Once again this is only one particular mapping to mutables and is 
achievable with the DSS as well. Unlike CORBA but similar to the DSS Inter- 
Weave has an open architecture for consistency protocols, called the coherence 
module with eligible protocols. While we have a dynamic architecture for co- 
ordination, i.e. the coordination strategy, they have chosen a static model with 
dedicated servers, much like the stationary coordination strategy in the DSS. 
Furthermore InterWeave has no support for automatic memory management. 

.Net [5] offers a single entity consistency protocol for one single type of en- 
tity, objects. It is however possible to change the protocol, using the (not well- 
documented) interception mechanism at considerable performance cost. 

JavaParty [23] and cJVM [24] are, though dedicated to just Java, two in- 
teresting systems with respect to their functionality. Using preprocessing and 
new library routines, JavaParty offers true transparency for Java with a proper 
thread distribution model and provides mobile state protocols. Similar to our 
model JavaParty allows for definition of entity consistency protocols in runtime 
for single objects. cJVM provides similar features as JavaParty but with the 
approach of extending the runtime system. The architecture is open for protocol 
addition, every object can choose from a set of consistency protocols. However, 
the patterns are monolithic and cannot be constructed from sub components as 
in our model(by composing coordination, consistency, and memory management 
strategies). Both systems are geared toward Java objects, and tightly integrated 
with the Java system. 
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The concept of distribution support based on a clear distinction between 
mutables and immutables was introduced with the Emerald[25] system. However, 
Emerald did not follow up on the potential strengths of this concept, allowing 
for a wide range of entity consistency protocols. None of the mentioned systems 
explore the domain of abstract entity types as we do, nor do they attempt to 
support all high level programming languages. Furthermore, we have found no 
trace in the literature exploring what we refer to as mobility for coordinators in 
open dynamic distributed systems. 

7 Conclusion 

We have presented a novel architecture for a language-independent middleware 
library. This library, the DSS, can be coupled to virtually all high-level program- 
ming languages thus creating powerful distributed programming systems. These 
distributed programming systems can then offer the programmer an extremely 
simple and powerful distributed programming model. 

The messaging capacity of the DSS has been evaluated and compared with 
other systems. The evaluation shows that the implementation is both efficient 
and with low overhead, especially if all the functionality provided by the mid- 
dleware is taken into consideration. Furthermore, we have shown that an appro- 
priate choice of protocol is the most dominant factor when tuning a distributed 
application for performance. 

A novel design of entity consistency protocols is presented. By separating 
functionality into three different parts (strategies), development of new protocols 
is greatly simplified. The powerful messaging framework simplifies protocol de- 
velopment even further. This is indicated by the few lines of C-|— I- code required 
to realize the complex consistency strategies mobile-state (281 lines) and eager 
invalidation (219 lines). 

The comparison between Mozart and Oz-DSS indicates that the benefit of 
tightly integrated distribution support is so small that it is not worth the effort. 

We think that the abstract entity model, together with the large protocol 
base, should make a library based on our model, e.g. the DSS, a first choice for 
any programming system/language that needs distribution support. 
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Abstract. A coinduction-based technique to generate an optimal mon- 
itor from a Linear Temporal Logic (LTL) formula is presented in this 
paper. Such a monitor receives a sequence of states (one at a time) from 
a running process, checks them against a requirements specification ex- 
pressed as an LTL formula, and determines whether the formula has 
been violated or validated. It can also say whether the LTL formula 
is not monitorable any longer, i.e., that the formula can in the future 
neither be violated nor be validated. A Web interface for the presented 
algorithm adapted to extended regular expressions is available. 



1 Introduction 

Linear Temporal Logic (LTL) [19] is a widely used logic for specifying properties 
of reactive and concurrent systems. The models of LTL are infinite execution 
traces, reflecting the behavior of such systems as ideally always being ready 
to respond to requests, operating system being typical example. LTL has been 
mainly used to specify properties of finite-state reactive and concurrent systems, 
so that the full correctness of the system can be verified automatically, using 
model checking or theorem proving. Model checking of programs has received an 
increased attention from the formal methods community within the last couple 
of years, and several tools have emerged that directly model check source code 
written in Java or G [7,26,27]. Unfortunately, such formal verification techniques 
are not scalable to real-sized systems without exerting a substantial effort to 
abstract the system more or less manually to a model that can be analyzed. 

Testing scales well, and in practice it is by far the technique most used to 
validate software systems. Our approach follows research which merges testing 
and temporal logic specification in order to achieve some of the benefits of both 
approaches; we avoid some of the pitfalls of ad hoc testing as well as the com- 
plexity of full-blown theorem proving and model checking. While this merger 
provides a scalable technique, it does result in a loss of coverage: the technique 
may be used to examine a single execution trace at a time, and may not be used 
to prove a system correct. Our work is based on the observation that software 
engineers are willing to trade coverage for scalability, so our goals is relatively 
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conservative: we provide tools that use formal methods in a lightweight manner, 
use traditional programming languages or underlying executional engines (such 
as JVMs), are completely automatic, implement very efficient algorithms, and 
can help find many errors in programs. 

Recent trends suggest that the software analysis community is interested in 
scalable techniques for software verification. Earlier work by Havelund and Ro§u 
[10] proposed a method based on merging temporal logics and testing. The Tem- 
poral Rover tool (TR) and its successor DB Rover by Drusinsky [2] have been 
commercialized. These tools instrument the Java code so that it can check the 
satisfaction of temporal logic properties at runtime. The MaC tool by Lee et 
al. [14,17] has been developed to monitor safety properties in interval past time 
temporal logic. In works by O’Malley et al. and Richardson et al. [20,21], vari- 
ous algorithms to generate testing automata from temporal logic formulae, are 
described. Java PathExplorer [8] is a runtime verification environment currently 
under development at NASA Ames. It can analyze a single execution trace. The 
Java MultiPathExplorer tool [25] proposes a technique to monitor all equivalent 
traces that can be extracted from a given execution, thus increasing the coverage 
of monitoring. Giannakopoulou et al. and Havelund et al. in [4,9] propose efficient 
algorithms for monitoring future time temporal logic formulae, while Havelund 
et al. in [11] gives a technique to synthesize efficient monitors from past time 
temporal formulae. Ro§u et al. in [23] shows use of rewriting to perform runtime 
monitoring of extended regular expressions. An approach similar to this paper 
is used to generate optimal monitors for extended regular expressions in work 
by Sen et al. [24]. 

In this paper, we present a new technique based on the modern coalgebraic 
method to generate optimal monitors for LTL formulae. In fact, such monitors 
are the minimal deterministic finite automata required to do the monitoring. Our 
current work makes two major contributions. First, we give a coalgebraic formal- 
ization of LTL and show that coinduction is a viable and reasonably practical 
method to prove monitoring-equivalences of LTL formulae. Second, building on 
the coinductive technique, we present an algorithm to directly generate mini- 
mal deterministic automata from an LTL formula. Such an automaton may be 
used to monitor good or bad prefixes of an execution trace (this notion will be 
rigorously formalized in subsequent sections). 

We describe the monitoring as synchronous and deterministic to obtain min- 
imal good or bad prefixes. However, if the cost of such monitoring is deemed too 
high in some application, and one is willing to tolerate some delay in discovering 
violations, the same technique could be applied on the traces intermittently - 
in which case one would not get minimal good or bad prefixes but could either 
bound the delay in discovering violations, or guarantee eventual discovery. We 
also give lower and upper bounds on the size of such automata. 

The closely related work by Geilen [3] builds monitors to detect a subclass 
of bad and good prefixes, which are called informative bad and good prefixes. 
Using a tableau-based technique, [3] can generate monitors of exponential size 
for informative prefixes. In our approach, we generate minimal monitors for 
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detecting all kinds of bad and good prefixes. This generality comes at a price: 
the size of our monitors can be doubly exponential in the worst case, and this 
complexity cannot be avoided. 

One standard way to generate an optimal monitor is to use the Buchi au- 
tomata construction [16] for LTL to generate a non-deterministic finite automa- 
ton, determinize it and then to minimize it. In this method, one checks only 
the syntactic equivalence of LTL formulae. In the coalgebraic technique that we 
propose as an alternative method, we make use of the monitoring equivalence 
(defined in subsequent sections) of LTL formulae. We thus obtain the minimal 
automaton in a single go and minimize the usage of computational space. More- 
over, our technique is completely based on deductive methods and can be applied 
to any logic or algebra for which there is a suitable behavioral specification. A 
related application can be found in [24] in which the minimal deterministic finite 
automata for extended regular expressions is generated. 



2 Linear Temporal Logic and Derivatives 

In order to make the paper self-contained, we briefly describe classical Linear 
Temporal Logic over infinite traces. We use the classical definition of Linear 
Temporal Logic and assume a finite set AP of atomic propositions. The syntax 
of LTL is as follows: 

(j) ::= true j false j a G AP \ -i0 \ (j>A<j)\4iV4i\(j)^(j)\<f) = 4‘\4‘®4' propositional 
(plA 4>\ I I ^4’ temporal 



The semantics of LTL is given for infinite traces. An infinite trace is an infinite 
sequence of program states, each state denoting the set of atomic propositions 
that hold at that state. The atomic propositions that hold in a given state s is 
given by AP{s). We denote an infinite trace by p; p{i) denotes the i-th state in 
the trace and p* denotes the suffix of the trace p starting from the i-th state. 
The notion that an infinite trace p satisfies a formula f is denoted hy p \= 
and is defined inductively as follows: 



true for all p 
a iff a G AP(p(l)) 

1^1 V 1^2 iff p 1= 4 >i or p 1= <j)2 
01 © <^2 iff P 1 = 01 exclusive or p |= 02 
1= 01 = 02 iff p 1= 01 iff p 1= 02 



1= O0 iff V ji > 1 p'’ 1= 0 
1= 01 U 02 iff there exists a j > 1 such that fP |= 02 and 1 <i < j : p* ^ 0i 



p)P false for all p 
p 1= ^0 iff pJ^ 0 

p 1= 01 A 02 iff p 1= 01 and p |= 02 
p 1= 01 — >■ 02 iff p 1= 01 implies p |= 02 
p 1= O0 iff p2 1= 0 

p 1= O0 iff 3 J > 1 such that fp \= f 



The set of all infinite traces that satisfy the formula 0 is called the language 
expressed by the formula 0 and is denoted by Thus, p G if and only if 
p\= 4>. The language is also called the property expressed by the formula 0. 
We informally say that an infinite trace p satisfies a property 0 iff p ^ 0. 

A property in LTL can be seen as the intersection of a safety property and 
a liveness property [1]. A property is a liveness property if for every finite trace 
a there exists an infinite trace p such that a.p satisfies the property. A property 
is a safety property if for every infinite trace p not satisfying the property, there 
exists a finite prefix a such that for all infinite traces p', a.p' does not satisfy 
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the property. The prefix a is called a bad prefix [3]. Thus, we say that a finite 
prefix a is a bad prefix for a property if for all infinite traces p, a.p does not 
satisfy the property. On the other hand, a good prefix for a property is a prefix 
a such that for all infinite traces p, a.p satisfies the property. A bad or a good 
prefix can also be minimal. We say that a bad (or a good prefix) a is minimal if 
a is a bad (or good) prefix and no finite prefix a' of a is bad (or good) prefix. 

We use a novel coinduction-based technique to generate an optimal monitor 
that can detect good and bad prefixes incrementally for a given trace. The es- 
sential idea is to process, one by one, the states of a trace as these states are 
generated; at each step the process checks if the finite trace that we have already 
generated is a minimal good prefix or a minimal bad prefix. At any point, if we 
find that the finite trace is a minimal bad prefix, we say that the property is 
violated. If the finite trace is a minimal good prefix then we stop monitoring for 
that particular trace and say that the property holds for that trace. 

At any step, we will also detect if it is not possible to monitor a formula any 
longer. We may stop monitoring at that point and say the trace is no longer 
monitorable and save the monitoring overhead. Otherwise, we continue by pro- 
cessing one more state and appending that state to the finite trace. We will see 
in the subsequent sections that these monitors can report a message as soon as 
a good or a bad prefix is encountered; therefore, the monitors are synchronous. 
Two more variants of the optimal monitor are also proposed; these variants can 
be used to efficiently monitor either bad prefixes or good prefixes (rather than 
both). Except in degenerate cases, such monitors have smaller sizes than the 
monitors that can detect both bad and good prefixes. 

In order to generate the minimal monitor for an LTL formula, we will use 
several notions of equivalence for LTL: 

Definition 1 (=). We say that two LTL formulae 4 >\ and (j)2 are equivalent i.e. 

= <t>2 if and only if L^^ = L^^ . 

Definition 2 (=b). For a finite trace a we say that a iff a is bad prefix 
for (j) i.e. for every infinite trace p it is the case that a.p ^ Given two LTL 
formulae and 4>2, 4 >i and (j)2 are said to be bad prefix equivalent i.e. =g 4>2 
if and only if for every finite trace a, a </>i iff a (j>2- 

Definition 3 (=g)- For a finite trace a we say that a\= <f iff a is good prefix 
for (f> i.e. for every infinite trace p it is that case that a.p G L^. Given two 
LTL formulae 4 >i and <f>2, 4 >i and (j)2 are said to be good prefix equivalent i.e. 
(j)i=G <f >2 if and only if for every finite trace a, a\= iff a \= (f> 2 - 
Definition 4 (=gb)- We say that and (j)2 are good-bad prefix equivalent 
i.e. 4 >i =GB 02 if and only if fii =b 4>2 and =g 4>2- 

Thus, for our purpose, the two non equivalent formulae 000 and OD0 are good- 
bad prefix equivalent since they do not have any good or bad prefixes. Such 
formula are not monitorable. Note that the equivalence relation = is included 
in the equivalence relation =gb, which is in turn included in both =g and =b- 
We will use the equivalences =g,=b, and =gb to generate optimal monitors 
that detect good prefixes only, bad prefixes only and both bad and good prefixes 
respectively. We call these three equivalences monitoring equivalences. 
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2.1 Derivatives 



We describe the notion of derivatives for LTL [9,10] based on the idea of state 
consumption: an LTL formula (p and a state s generate another LTL formula, 
denoted by </){s}, with the property that for any finite trace a, so (/) if and 
only if a 0{s} &nd sa \= <f> ii and only if a |= We define the operator 

_{_} recursively through the following equations: 



false {s} = false 

p{s} = if p G AP{s) then true else false 
ipl V 02){s} = V 02{s} 

ipl <^ 2 ){s} = 0l{s} (j> 2 {s} 

(0(^){s} = V O0 

{pi U p 2 ){s} = p 2 {s} V AplU p 2 ) 



true {s} = true 
(^0){s} = -^(0{s}) 

{pi A <^2){s} = 0i{s} A p 2 {s} 
{pi © </'2){s} = © p2{s} 

(□</>){s} = 0{s} A op 



We use the decision procedure for propositional calculus by Hsiang [13] to get a 
canonical form for a propositional formula. The procedure reduces a tautological 
formula to the constant true, a false formula to the constant false, and all other 
formulae to canonical forms modulo associativity and commutativity. An unusual 
aspect of this procedure is that the canonical forms consist of exclusive or (©) 
of conjunctions. The procedure is given below using equations that are shown to 
be Church-Rosser and terminating modulo associativity and commutativity. 



true A p = p 
p A p = p 
p(B p = false 

pi A {p2 © ps) = {pi A P 2 ) © {pi A ps) 
pi ^ p 2 = true © 01 © (01 A 02) 



false A 0 = false 

false ©0 = 0 

-10 = true © 0 

01 V 02 = (01 A 02) © 01 © 02 

01 = 02 = true © 01 © 02 



The exclusive or operator © and the A operator are defined as commutative 
and associative. The equations Derivative and Propositional Calculus 
when regarded as rewriting rules are terminating and Church-Rosser (modulo 
associativity and commutativity of A and ©), so they can be used as a functional 
procedure to calculate derivatives. 

In the rest of the paper, at several places we need to check if an LTL formula 
is equivalent to true or false. This can be done using the tableau-based proof 
method for LTL; the STeP tool at Stanford [18] has such an implementation. 

The following result gives a way to determine if a prefix is good or bad for a 
formula through derivations. 

Theorem 1. a) For any LTL formula p and for any finite trace a = S1S2 ■ ■ ■ Sn, 
a is a bad prefix for p if and only if 0{si}{s2} ■ • ■ {sn} = false. Similarly, a is 
a good prefix for p if and only if 0{si}{s2} • • ■ {sn} = true, h) The formula 
0 {si}{s 2 } . . . {s„} needs 0(2^®^®!'^)) space to be stored. 

Proof, b): Due to the Boolean ring equations above regarded as simplification 
rules, any LTL formula is kept in a canonical form, which is an exclusive disjunc- 
tion of conjunctions, where conjuncts have temporal operators at top. Moreover, 
after a series of applications of derivatives Si, S 2 , ..., s„, the conjuncts in the nor- 
mal form 0 {si}{s 2 }---{sn} are subterms of the initial formula p, each having a 
temporal operator at its top. Since there are at most size{p) such subformulae, it 
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follows that there are at most 2'***®*^'^) possibilities to combine them in a conjunc- 
tion. Therefore, one needs space to store any exclusive disjunction of 

such conjunctions. This reasoning only applies on “idealistic” rewriting engines, 
which carefully optimize space needs during rewriting. □ 

In order to effectively generate optimal monitors, it is crucial to detect efficiently 
and as early as possible when two derivatives are equivalent. In the rest of 
the paper we use coinductive techniques to solve this problem. We define the 
operators G : LTL — >■ {true, false} and B : LTL — >■ (true, false} that return 
true if an LTL formula is equivalent (=) to true or false respectively, and return 
false otherwise. We define an operator GB : LTL — >■ {0,1,?} that checks if an 
LTL formula (j) is equivalent to false or true and returns 0 or 1 , respectively, and 
returns ? if the formula is not equivalent to either true or false. 



3 Hidden Logic and Coinduction 

We use circular coinduction, defined rigorously in the context of hidden logics 
and implemented in the BOBJ system [22,5,6], to test whether two LTL formu- 
lae are good-bad prefix equivalent. A particularly appealing aspect of circular 
coinduction in the framework of LTL formula is that it not only shows that two 
LTL formulae are good-bad prefix equivalent, but also generates a larger set of 
good-bad prefix equivalent LTL formulae which will all be used in order to gen- 
erate the target monitor. Readers familiar with circular coinduction may assume 
the result in Theorem 4 and read Section 4 concurrently. 

Hidden logic is a natural extension of algebraic specification which benefits of 
a series of generalizations in order to capture various natural notions of behav- 
ioral equivalence found in the literature. It distinguishes visible sorts for data 
from hidden sorts for states, with states behaviorally equivalent if and only if 
they are indistinguishable under a formally given set of experiments. In order to 
keep the presentation simple and self-contained, we define a simplified version of 
hidden logic together with its associated circular coinduction proof rule which is 
nevertheless general enough to support the definition of LTL formulae and prove 
that they are behaviorally good and/or bad prefix equivalent. 

3.1 Algebraic Preliminaries 

We assume that the reader is familiar with basic equational logic and algebra but 
recall a few notions in order to just make our notational conventions precise. An 
A-sorted signature A is a set of sorts/types S together with operational symbols 
on those, and a A-algebra A is a collection of sets {As | s G S'} and a collection 
of functions appropriately defined on those sets, one for each operational symbol. 
Given an S-sorted signature A and an S-indexed set of variables Z, let Ts{Z) 
denote the A-term algebra over variables in A. If A C S then S\y is a A-sorted 
signature consisting of all those operations in A with sorts entirely in A. We 
may let <j{X) denote the term a{x\, ...,cc„) when the number of arguments of cr 
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and their order and sorts are not important. If only one argument is important, 
then to simplify writing we place it at the beginning; for example, a{t,X) is a 
term having a as root with no important variables as arguments except one, in 
this case t. If t is a 27-term of sort s' over a special variable * of sort s and A is 
a 27-algebra, then At — >■ Ag' is the usual interpretation of t in A. 



3.2 Behavioral Equivalence, Satisfaction and Specification 

Given disjoint sets V, H called visible and hidden sorts, a hidden {V, H) -signature, 
say 27, is a many sorted (EUi?)-signature. A hidden subsignature o/ 27 is a hidden 
(y, i?)-signature F with A C 27 and F\v= 27fy. The data signature is 27|'y. An 
operation of visible result not in 27 fy is called an attribute, and a hidden sorted 
operation is called a method. 

Unless otherwise stated, the rest of this section assumes fixed a hidden sig- 
nature 27 with a fixed subsignature F. Informally, 27-algebras are universes of 
possible states of a system, i.e., “black boxes,” for which one is only concerned 
with behavior under experiments with operations in F, where an experiment is 
an observation of a system attribute after perturbation. 

A F -eontext for sort sGV D H is a, term in Tr{{* '■ s}) with one occurrence 
of *. A T-context of visible result sort is called a F -experiment. If c is a context 
for sort h and t G Ts,h then c[t] denotes the term obtained from c by substituting 
t for *; we may also write c[*] for the context itself. 

Given a hidden 27-algebra A with a hidden subsignature F, for sorts s G 
(V UiJ), we define F -behavioral equivalence of a, a' G by a =£ a' iff Ac(a) = 
Ac(a') for all T-experiments c; we may write = instead of =£ when 27 and F 
can be inferred from context. We require that all operations in 27 are compatible 
with =£. Note that behavioral equivalence is the identity on visible sorts, since 
the trivial contexts * : v are experiments for all v G V . A major result in 
hidden logics, underlying the foundations of coinduction, is that T-behavioral 
equivalence is the largest equivalence which is identity on visible sorts and which 
is compatible with the operations in F. 

Behavioral satisfaction of equations can now be naturally defined in terms 
of behavioral equivalence. A hidden 27-algebra A F -behaviorally satisfies a 27- 
equation (VA) t = t' , say e, iff for each 0 : A — >• A, Oft) =£ 0{t')-, in this case 
we write A e. If i7 is a set of A-equations we then write A E when A 
T-behaviorally satisfies each A-equation in E. We may omit A and/or F from 
when they are clear. 

A behavioral E -specification is a triple (A, F, E) where A is a hidden signa- 
ture, A is a hidden subsignature of A, and A is a set of A-sentences equations. 
Non-data A-operations (i.e., in A — Afy) are called behavioral. A A-algebra A 
behaviorally satisfies a behavioral specification B = (E,F,E) iff A E, in 
which case we write A ^ B; also B ^ e iff A ^ B implies A e. 

LTL can be very naturally defined as a behavioral specification. The enor- 
mous benefit of doing so is that the behavioral inference, including most impor- 
tantly coinduction, provide a decision procedure for good-bad prefix equivalence. 
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Example 1 . A behavioral specification of LTL defines a set of two visible sorts 
V = {Triple, State}, one hidden sort H = {Ltl}, one behavioral attribute 
GB : Ltl — >■ Triple (defined as an operator in Subsection 2.1) and one behavioral 
method, the derivative, _{_} : Ltl x State — >■ Ltl, together with all the other 
operations in Section 2 defining LTL, including the states in S which are defined 
as visible constants of sort State, and all the equations in Subsection 2.1. The 
sort Triple consists of three constants 0, 1, and ?. We call this the LTL behavioral 
specifieation and we use B^tl/gb to denote it. 

Since the only behavioral operators are the test for equivalence to true and 
false and the derivative, it follows that the experiments have exactly the form 
Gi?(*{si}{s 2 }---{sn}), for any states si, S 2 , ..., Sn- In other words, an experi- 
ment consists of a series of derivations followed by an application of the operator 
GB, and therefore two LTL formulae are behavioral equivalent if and only if they 
cannot be distinguished by such experiments. Such behavioral equivalence is ex- 
actly same as good-bad prefix equivalence. In the specification of Bitl/gb if 
we replace the attribute GB by B (or G), as defined in Subsection 2.1, the be- 
havioral equivalence becomes same as bad prefix (or good prefix) equivalence. 
We denote such specifications by Bltl/b (or Bltl/g)- Notice that the above 
reasoning applies within any algebra satisfying the presented behavioral specifi- 
cation. The one we are interested in is, of course, the free one, whose set carriers 
contain exactly the LTL formulae as presented in Section 2, and the operations 
have the obvious interpretations. We informally call it the LTL algebra. 

Letting denote the behavioral equivalence relation generated on the LTL 
algebra, then Theorem 1 immediately yields the following important result. 

Theorem 2. If (j)i and 4>2 are two LTL formulae then (j>i =t 02 in, B^tl/gb 
iff 4>i and 02 are good-bad prefix equivalent. Similarly, 0i =b 02 in B^tl/b (or 
Bltl/g) if and only if 4>i and 02 are bad prefix (or good prefix) equivalent. 

This theorem allows us to prove good-bad prefix equivalence (or bad prefix or 
good prefix equivalence) of LTL formulae by making use of behavioral inference 
in the LTL behavioral specification B^tl/gb (or B^tl/b or B^tl/g) including 
(especially) circular coinduction. The next section shows how circular coinduc- 
tion works and how it can be used to show LTL formulae good-bad prefix equiv- 
alent (or bad prefix equivalent or good prefix equivalent) . From now onwards we 
will refer B]^tl/gb simply by B. 



3.3 Circular Coinduction as an Inference Rule 

In the simplified version of hidden logics defined above, the usual equational in- 
ference rules, i.e., reflexivity, symmetry, transitivity, substitution and congruence 
[22] are all sound for behavioral satisfaction. However, equational reasoning can 
derive only a very limited amount of interesting behavioral equalities. For that 
reason, circular coinduction has been developed as a very powerful automated 
technique to show behavioral equivalence. We let llh denote the relation being 
defined by the equational rules plus circular coinduction, for deduction from a 
specification to an equation. 
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Before formally defining circular coinduction, we give the reader some intu- 
itions by duality to structural induction. The reader who is only interested in 
using the presented procedure or who is not familiar with structural induction, 
can skip this paragraph. Inductive proofs show equality of terms t{x),t'{x) over 
a given variable x (seen as a constant) by showing t{a{x)) equals t'(cr(x)) for 
all CT in a basis, while circular coinduction shows terms t, t' behaviorally equiv- 
alent by showing equivalence of 5{t) and 5{t') for all behavioral operations S. 
Coinduction applies behavioral operations at the top, while structural induction 
applies generator/constructor operations at the bottom. Both induction and cir- 
cular coinduction assume some “frozen” instances of t, t' equal when checking 
the inductive/coinductive step: for induction, the terms are frozen at the bottom 
by replacing the induction variable by a constant, so that no other terms can 
be placed beneath the induction variable, while for coinduction, the terms are 
frozen at the top, so that they cannot be used as subterms of other terms (with 
some important but subtle exceptions which are not needed here; see [6]). 

Freezing terms at the top is elegantly handled by a simple trick. Suppose 
every specification has a special visible sort b, and for each (hidden or visible) 
sort s in the specification, a special operation [_] : s — >■ b. No equations are 
assumed for these operations and no user defined sentence can refer to them; 
they are there for technical reasons. Thus, with just the equational inference 
rules, for any behavioral specification B and any equation (VX) f = t', it is 
necessarily the case that B llh (VX) t = t' iS B llh (VX) [t] = [t']. The rule 
below preserves this property. Let the sort of t, t' be hidden; then 
Circular Coinduction: 

B U {(VX) [t] = [t']} llh (VX, W) [<5(t, W)] = IT)], for all appropriate 5 € T 

B llh (VX) t = t' 

We call the equation (VX) [t] = [t'] added to a circularity; it could just as 
well have been called a coinduction hypothesis or a co-hypothesis, but we find the 
first name more intuitive because from a coalgebraic point of view, coinduction 
is all about finding circularities. 

Theorem 3. The usual equational inference rules together with Circular Coin- 
duction are sound. That means that if B llh (VX) t = t' and sort{t,t') yh b, or if 
B llh (VX) [t] = [t'], then B ^ (VX) t = t'. 

Circular coinductive rewriting[5,6] iteratively rewrites proof tasks to their normal 
forms followed by an one step coinduction if needed. Since the rules in B^tl/ gbj 
Bltl/b, and Bltl/g are ground Church- Rosser and terminating, this provides 
us with a decision procedure for good-bad prefix equivalence, bad prefix equiv- 
alence, and good prefix equivalence of LTL formulae respectively. 

Theorem 4. If 4>i and 4>2 are two LTL formulae, then 4>\ =gb 4>2 if and only 
if Bltl/gb 111“ 4>i = 02- Similarly, if 4>i and (j )2 are two LTL formulae, then 
(fi =B 4)2 (or 4>i =G 42 ) if and only if Bltl/b 4>i = 4>2 ( or Bltl/g 
4>i = 4>2)- Moreover, circular coinductive rewriting provides us with a decision 
procedure for good-bad prefix equivalence, bad prefix equivalence, and good prefix 
equivalence of LTL formulae. 
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Proof. By soundness of behavioral reasoning (Theorem 3), one implication fol- 
lows immediately via Theorem 2. For the other implication, assume that 4>i and 
4>2 are good-bad prefix equivalent (or good prefix or bad prefix equivalent, respec- 
tively) and that the equality </>i = 4>2 is not derivable from B]^tl/gb (or 
or Bi^tl/Bi respectively). By Theorem 1, the number of formulae into which 
any LTL formula can be derived via a sequence of events is finite, which means 
that the total number of equalities 4>'i = 4/2 that can be derived via the circular 
coinduction rule is also finite. That implies that the only reason for which the 
equality 4>i = ^2 cannot be proved by circular coinduction is because it is in fact 
disproved by some experiment, which implies the existence of some events oi, 
..., a„ such that GB{<pi{ai} ■ ■ ■ {a„}) yf GB{(j) 2 {ai} ■ ■ ■ {a„}) (or the equivalent 
ones for B or G). However, this is obviously a contradiction because if 4>\ and 
4>2 are good-bad (or good or bad) prefix equivalent that so are 4>\{ai} ■ ■ ■ {a„} 
and 02{oi} • • • {««}, and GB (or G or B) preserve this equivalence. 

4 Generating Optimal Monitors by Coinduction 

We now show how one can use the set of circularities generated by applying 
the circular coinduction rules in order to generate, from any LTL formula, an 
optimal monitor that can detect both good and bad prefixes. The optimal mon- 
itor thus generated will be a minimal deterministic finite automaton containing 
two final states true and false. We call such a monitor GB-automaton. We con- 
clude the section by modifying the algorithm to generate smaller monitors that 
can detect either bad or good prefixes. We call such monitors B-automaton and 
G-automaton respectively. The main idea behind the algorithm is to associate 
states in GB-automaton to LTL formulae obtained by deriving the initial LTL 
formula; when a new LTL formula is generated, it is tested for good-bad pre- 
fix equivalence with all the other already generated LTL formulae by using the 
coinductive procedure presented in the previous section. A crucial observation 
which significantly reduces the complexity of our procedure is that once a good- 
bad prefix equivalence is proved by circular coinductive rewriting, the entire set 
of circularities accumulated represent good-bad prefix equivalent LTL formu- 
lae. These can be used to quickly infer the other good-bad prefix equivalences, 
without having to generate the same circularities over and over again. 

Since BOBJ does not (yet) provide any mechanism to return the set of circu- 
larities accumulated after proving a given behavioral equivalence, we were unable 
to use BOBJ to implement our optimal monitor generator. Instead, we have im- 
plemented our own version of coinductive rewriting engine for LTL formulae, 
which is described below. 

We are given an initial LTL formula 4>o over atomic propositions P. Then 
cr = 2^ is the set of possible states that can appear in an execution trace; note 
that a will be the set of alphabets in the GB-automaton. Now, from (j)o we want 
to generate a GB-automaton D = {S, a, 6, Sq, {true, false}), where S is the set of 
states of the GB-automaton, S : S x a ^ S is the transition function, sq is the 
initial state of the GB-automaton, and (true, false} C S' is the set of final states 
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of the DFA. The coinductive rewriting engine explicitly accumulates the proven 
circularities in a set. The set is initialized to an empty set at the beginning of the 
algorithm. It is updated with the accumulated circularities whenever we prove 
good-bad prefix equivalence of two LTL formulae in the algorithm. The algorithm 
maintains the set of states S in the form of non good-bad prefix equivalent LTL 
formulae. At the beginning of the algorithm S is initialized with two elements, 
the constant formulae true and false. Then, we check if the initial LTL formula 
00 is equivalent to true or false. If 0o is equivalent to true or false, we set sq to 
true or false respectively and return D as the GB-automaton. Otherwise, we set 
So to 00) add 0o to the set S, and invoke the procedure dfs (see Fig 1) on 0o. 

The procedure dfs generates the derivatives of a given formula 0 for all 
X £ a one by one. A derivative <px = is added to the set S, if the set does 
not contain any LTL formula good-bad prefix equivalent to the derivative 0a,. 
We then extend the transition function by setting 5(0, x) = 0a, and recursively 
invoke dfs on 0a,. On the other hand, if an LTL formula 0' equivalent to the 
derivative already exists in the set S, we extend the transition function by setting 
6{4>,x) = 0'. To check if an LTL formula, good-bad prefix equivalent to the 
derivative 4>x, already exists in the set S, we sequentially go through all the 
elements of the set S and try to prove its good-bad prefix equivalence with 0a;. In 
testing the equivalence we first add the set of circularities to the initial B^tl/ gb- 
Then we invoke the coinductive procedure. If for some LTL formula 0' G S, we 
are able to prove that 0' =gb (j>x be Bi^tl/ llh 0' = (f>x, then we 
add the new equivalences created by the coinductive procedure, to the set 

of circularities. Thus we reuse the already proven good-bad prefix equivalences 
in future proofs. 



S <r- {true, false} 

dfs(0) 

begin 

foreach x & a do 

0a: 4>{X}\ 

if 30' € S such that Bltl/gb U U llh 0' = 0a, then 

5(0, = 0'; G- Eq^^^ U Eq^^^ 

else S' ■«— S U {0a:}; 5(0, ®)=0a:; dfs(0a:); fi 

endfor 

end 

Fig. 1. LTL to optimal monitor generation algorithm 



The GB-automaton generated by the procedure dfs may now contain some states 
which are non-final and from which the GB-automaton can never reach a final 
state. We remove these redundant states by doing a breadth first search in back- 
ward direction from the final states. This can be done in time linear in the size 
of the GB-automaton. If the resultant GB-automaton contains the initial state 
So then we say that the LTL formula is monitorable. That is for the LTL formula 
to be monitorable there must be path from the initial state to a final state i.e. 
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to true or false state. Note that the GB-automaton may now contain non-final 
states from which there may be no transition for some x G a. Also note that no 
transitions are possible from the final states. 

The correctness of the algorithm is given by the following theorem. 

Theorem 5. If D is the GB-automaton generated for a given LTL formula (j) 
by the above algorithm then 

1) C{D) is the language of good and bad prefixes of f, 

2) D is the minimal deterministic finite automaton accepting the good and bad 

prefixes of 4>. 

Proof. 1) Suppose siS 2 . ■ . s„ be a good or bad prefix of (j). Then by Theorem 
1, GB((^{si}{s 2 } . . . {s„}) G {0,1}. Let 4>i = </){si}|s 2 } ■ • ■ {s*}; then (fi+i = 
(j)i{ai+i\. To prove that siS 2 ■ . ■ Sn G C{D), we use induction to show that for 
each 1 < i < n, 4>i =gb SiS2 ■ • ■ Si). For the base case if 4>i =gb '/'{•Si} then 
dfs extends the transition function by setting si) = (j). Therefore, (j>i =gb 
(f) = si). If fi ^gb <I> then dfs extends S by setting S{(p, si) = (f)i. So fi =gb 
si) holds in this case also. For the induction step let us assume that 4>i =gb 
(j)' = S{(j), siS 2 ■ ■ ■ Si). If Si+i) = (j)" then from the dfs procedure we can see 
that (j)” =GB <f'{si+i}. However, (j>i{si+i} =gb since (j), =gb by 

induction hypothesis. So 4>i+i =GB 4>" = d{4>',Si+i) = S{4>, siS 2 . . . Si+i). Also 
notice GB{(j)n =gb <5(0, siS 2 . . . s„)) G (0, 1}; this implies that <5(0, siS 2 • ■ • Sn) 
is a final state and hence siS 2 . ■ . s„ G C{D). 

Now suppose S 1 S 2 . . . s„ € C{D). The proof that S 1 S 2 ... s„ is a good or bad 
prefix of 0 goes in a similar way by showing that 0^ =gb <5(0, S 1 S 2 . ■ . Sj). 

If the automaton D is not minimal then there exists at least two states p and 
qva D such that p and q are equivalent [12] i.e. Vic G a* : 5{p, w) G F if and only 
if 5{q, w) G F, where F is the set of final states. This means, if 0i and 02 are the 
LTL formulae associated with p and q respectively in dfs then 0i =gb 02- But 
dfs ensures that no two LTL formulae representing the states of the automaton 
are good-bad prefix equivalent. So we get a contradiction. □ 

The GB-automaton thus generated can be used as a monitor for the given 
LTL formula. If at any point of monitoring we reach the state true in the GB- 
automaton we say that the monitored finite trace satisfies the LTL formula. 
If we reach the state false we say that the monitored trace violates the LTL 
formula. If we get stuck at some state i.e. we cannot take a transition, we say 
that the monitored trace is not monitorable. Otherwise we continue monitoring 
by consuming another state of the trace. 

In the above procedure if we use the specification B^tl/b (or B^tl/g) instead 
of Bltl/gb and consider false (or true) as the only final state, we get a B- 
automaton (or G-automaton) . These automata can detect either bad or good 
prefixes. Since the final state is either false or true the procedure to remove 
redundant states will result in smaller automata compared to the corresponding 
GB-automaton. 
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We have an implementation of the algorithm adapted to extended regular 
expressions which is available for evaluation on the internet via a CGI server 
reachable from http://fsl.cs.uiuc.edu/rv/. 

5 Time and Space Complexity 

Any possible derivative of an LTL formula (j>, in its normal form, is an exclusive 
or of conjunctions of temporal subformulae (subformulae having temporal oper- 
ators at the top) in <j). The number of such temporal subformulae is 0{m), where 
m is the size of (j). Hence, by counting argument, the number of possible con- 
juncts is 0(2’”). The number of possible exclusive ors of these conjuncts is then 
0(2^ ). Therefore, the number of possible distinct derivatives of (j is 0(2^ ). 
Since the number states of the GB-automaton accepting good and bad prefixes 
of (p cannot be greater than the number of derivatives, 2^ is an upper bound 
on the number of possible states of the GB-automaton. Hence, the size of the 
GB-automaton is 0(2^ ). Thus we get the following lemma: 

Lemma 1. The size of the minimal GB-automaton accepting the good and bad 
prefixes of any LTL formula of size m is 0(2^ ). 

For the lower bound on the size of the automata we consider the language 

Lk = I w € {0, 1}*" and cr, o' £ {0, 1, #}*}• 

This language was previously used in several works [15,16,23] to prove lower 
bounds. The language can be expressed by the LTL formula [16] of size 

n 

= [(-,$)// ($//OD(-n$))]AO[#AO"+i# A /\((O’0AO($->O’0))V(OH A □($->OH)))]. 

i=l 

For this LTL formula the following result holds. 

Lemma 2. Any GB-automaton accepting good and bad prefixes of pk will have 
size 17(2^ ). 

Proof: In order to prove the lower bound, the following equivalence relation 
on strings over (0 -I- 1 -I- #)* is useful. For a string cr G (0 -I- 1 -I- #)*, define 
>5'(cr) = {ru € (0 -I- 1)* I 3Ai, A 2 . \ 1 ffwffX 2 = cr}. We will say that cri =k iff 
S'(cri) = 5 '(ct2). Now observe that the number of equivalence classes of =fe is 2^ ; 
this is because for any S' C (0 -I- 1)*, there is a cr such that S(cr) = S. 

We will prove this lower bound by contradiction. Suppose A is a GB- 
automaton that has a number of states less than 1? for the LTL formula pk- 
Since the number of equivalence classes of =k is 2^ , by pigeon hole princi- 
ple, there must be two strings ai <^2 such that the state of A after read- 
ing cTi$ is the same as the state after reading CT 2 $. In other words, A will 
reach the same state after reading inputs of the form criSic and ct 2 $w. Now 
since cri ct 2 , it follows that (5'(cri) \ S{a 2 ) U {S{a 2 ) \ S'(cri)) ^ 0. Take 
w G (S'(cri) \ S{(J 2 ) U (S'(ct 2 ) \ S'(cti)). Then clearly, exactly one out of cri$w 
and CT 2 $rc is in L^, and so A gives the wrong answer on one of these inputs. 
Therefore, A is not a correct GB-automaton. □ 

Gombining the above two results we get the following theorem. 
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Theorem 6. The size of the minimal GB-automaton accepting the good and bad 
prefixes of any LTL formula of size m is 0(2^ ) and 17(2^'^). 

The space and time complexity of the algorithm is given by the following: 

Theorem 7. The LTL to optimal monitor generation algorithm requires 2*^^^ ) 
space and c2'^^^ I time for some constant c. 

Proof: The number of distinct derivatives of an LTL formula of size m can be 
0(2^ ). Each such derivative can be encoded in space 0(2’”). So the number 
of circularities that are generated in the algorithm can consume 0(2^ 2"’2"’) 
space. The space required by the algorithm is thus □ 

The number of iterations that the algorithm makes is less than the number of 
distinct derivatives. In each iteration the algorithm generates a set of circularities 
that can be at most 2*^^^ \ So the total time taken by the algorithm is c2*^^^ ) 
for some constant c. 

6 Conclusion and Future Work 

In this paper we give a behavioral specification for LTL, which has the appealing 
property that two LTL formulae are equivalent with respect to monitoring if and 
only if they are indistinguishable under carefully chosen experiments. To our 
knowledge, this is the first coalgebraic formalization of LTL. The major bene- 
fit of this formalization is that one can use coinduction to prove LTL formulae 
monitoring-equivalent, which can further be used to generate optimal LTL mon- 
itors on a single go. As future work we want to apply our coinductive techniques 
to generate monitors for other logics. 
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Abstract. Today, timed automaton is the standard tool for specifying 
and verifying real-time systems. On the other hand, recently, probabilis- 
tic timed automata have been developed in order to express the relative 
likelihood of the system exhibiting certain behavior. In this paper, we 
develop the verification method of timed simulation relation of proba- 
bilistic timed automata, and apply this method to stepwise refinement 
developments of real-time systems. This kind of simularity is a valuable 
theoretical tool to prove soundness of refinements. 



1 Introduction 

Today, timed automaton [1] is the standard tool for specifying and verifying 
real-time systems by model-checking methods [2,3]. On the other hand, in or- 
der to express the relative likelihood of the system exhibiting certain behavior, 
M. Kwiatkowska has developed probabilistic timed automata and their model- 
checking method [4]. In this paper, we develop timed simulation relations of 
probabilistic timed automata and its automatic verification method. This kind 
of simularity is a valuable theoretical tool to prove soundness of refinements. 
There have been several refinement verification methods of timed automata as 
follows: 

1. As timed language containment is undecidable [1], K. Cerans has showed 
timed bisimulation is decidable in 1992 [5]. 

2. S. Tasiran and his colleagues have developed a compositional safety timed 
simulation verification method in 1996 [6]. 

3. S. Yamane has developed both V- and 3-timed simulation verification meth- 
ods in 1998 [7]. 

On the other hand, there have been several refinement verification methods 
of probabilistic timed systems as follows: 

1. H.A. Hansson has developed both model-checking methods and bisimulation 
verification methods of discrete time probabilistic systems in 1991 [8]. 

2. R. Segala has developed the model of probabilistic timed automata, which 
is a dense time model, but is used in the context of manual simulation and 
bisimulation verification techniques in 1995 [9]. 
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3. A notable contribution to the area of the verification of probabilistic systems 
operating in dense time was offered by Alur, Courcoubetis and Dill [10,11], 
who provided a model checking technique for a variant of Generalized Semi- 
Markov Processes against timed properties in 1991. 

To the best of our knowledge, timed simulation verification methods of prob- 
abilistic timed automata have never been developed before now ( but we have 
published a preliminary version of some simple timed simulation verification of 
probabilistic timed automata [12] ). In this paper, we develop timed simulation 
verification methods of probabilistic timed automata with discrete probability 
distributions, and apply our proposed methods to stepwise refinement develop- 
ment of real-time systems. 

The paper is organized as follows: In section 2, we introduce some preliminary 
concepts and notations. In section 3, we define probabilistic timed automata. In 
section 4, we define a timed simulation relation of probabilistic timed automata. 
In section 5, we propose verification method of a timed simulation relation. In 
section 6, we apply our proposed methods to stepwise refinement development 
of real-time systems. Finally, in section 7, we present conclusions. 

2 Preliminaries 

In this section, we introduce some preliminary concepts and notations. 

First, we define discrete probability distributions as follows: 

Definition 1 (Discrete probability distribution) 

We denote the set of discrete probability distributions over a finite set S by p,{S). 
Therefore, each p G /r(S') is a function p : S' — >■ [0, 1] such that '^s&gp{s) = 1. ■ 

Next we define Markov decision processes as follows: 

Definition 2 (Markov decision processes) 

A Markov decision process is denoted by {Q, Steps), where Q is a set of states, 
and Steps : Q — >■ 2^*^*^^ is a function assigning a set of probability distributions to 
each state. Our intuition is that the Markov decision process traverses the state 
space by making transitions determined by Steps; that is, in the state q € Q, a 
transition is made by first nondeterministically selecting a probability distribution 
p G Steps{q), and then performing a probabilistic choice according to p as to 
which move to. If the state selected by q is qt, then we denote such a transition 
by q-^ q'. 

Occasionally we will require an additional event in the definition of Steps, 
so that, for some set S, Steps : Q — >■ is now a function assigning a 

pair {(T,p), comprising of an event and a probability distribution, to each state 
(a transition is now denoted by q qf). ■ 

Next we define clocks and clock valuations as follows: 

Definition 3 (Clocks and clock valuations) 

A clock is a real-valued variable which increases at the same rate as real-time. 
Let X = {a^i, • • ■ , Xn} be a set of clocks. A valuation of x is a vector a of length 
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n in R", which, intuitively, assigns a real value to each variable Xi G x- 
denote the set of all clock valuations of x by or R". A valuation can also 
be regarded as defining a point in 'RA -space. We write 0 for the valuation which 
assigns 0 to each variable x G Xj o,nd a. 5 to represent the valuation which 
assigns + 5 to each variable Xi G x> for the valuation a G R” and the real 
value S G R". 

Next we define reset operation on valuations as follows: Let X C x &e a subset 
of variables, a G R" be a valuation. Then a[X := 0] is the set of valuations such 
that a^/ G a[X := 0] if and only if, for all Xi G X, we have a^/ = 0 and for all 
other Xi G x \ ^ ! we have a^/ = a^. ■ 

Next we define clock constraints as follows: 

Definition 4 (Clock constraints) 

A constraint over x on expression of the form Xi ^ c or Xi — Xj ~ c, where 
^ A i ^ j < n, ~G {<, <, >, >}, c G N U {oo}, Xq = 0. ■ 

Next we define zones as follows: 

Definition 5 (Zones) 

A zone of x, written C, is a convex subset of the valuation space R" described by 
a conjunction of constraints. Formally, a zone f is the set of valuations which 
satisfy the conjunction o/ n • (n + 1) constraints given by: 

Xi Xj ^ Cij 

0<i^j<n 

Let be the set of all zones of x- We denote by CmaxiC) largest constant 
used in the description of a zone. 

For any zones f G Z^, and X C x a set of clocks, let := 0] = {a[X := 

0]l^ G C}- 

We write a\= Z^ if the valuation a satisfies the set of all zones Z^. ■ 

3 Probabilistic Timed Automata 

In this section, we introduce probabilistic timed automata as a modelling frame- 
work for real-time systems with probability [4]. 

3.1 Syntax and Semantics of Probabilistic Timed Automata 

First we define syntax of probabilistic timed automata. We extend timed au- 
tomata with discrete probability distributions over edges, so that the choice of 
the next location of the automaton is now probabilistic, in addition to nonde- 
terministic, in nature. 

Definition 6 (Syntax of probabilistic timed automata) 

A probabilistic timed automaton is a tuple G = {S, E,s,x,'inv,prob, < Ts >ses) 
which contains: 
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1. a finite set S of nodes, 

2. a finite set S of events, 

3. a start node s € S, 

4- a finite set x of clocks, 

5. a function inv : S ^ assigning to each node an invariant condition, 

6. a function prob : S — >■ 2 ^x 2 ’‘x^i(Sxr ) assigning to each node both a set of 
discrete probability distributions on S x R" and S x 2^, 

1. a family of functions < Ts >seS where, for any s € S, Ts '■ prob{s) 

assigns to each (a,X,p) G prob{s) an enabling condition, where a € U, 
p G X R"), A gives the clocks to be reset with this function. ■ 

Next we informally define semantics of probabilistic timed automata. A state 
of a probabilistic timed automaton G = {S, S, s, x, inv, prob, < Ts >ses) is a pair 
< s, a >G S' X R”, where s G S is a node, and a G R” is a clock valuation such 
that a G inv{s). The set of < s, a > is denoted by 11. The distributions available 
for nondeterministic choice in a state s can be categorised as corresponding either 
to time transitions or probabilistic discrete transitions: 

1. A time transition of duration 6 G R" is possible if the clock values obtained 
by adding S to each of the current values satisfy the invariant condition 
associated with the current node. More formally, if < s,a > is the current 
state of the system, then a time transition of duration S G R" to the state 
< s, a + (5 > is possible if a + i5 G inv{s). 

2. A probabilistic discrete transition corresponding to the distribution p is pos- 
sible if and only if p belongs to the set of distributions associated with the 
current node, and the current clock values satisfy the enabling condition 
of p. That is, if < s, a > is the current state of the system, then a prob- 
abilistic transition corresponding to the distribution p can be performed if 
(cr, X,p) G prob{s) and a G Ts{{cr, X,p)). 



Paths in a probabilistic timed automaton based on the above semantics arise 
by resolving both the nondeterministic and probabilistic choices. A path of the 
probabilistic timed automaton G = {S, S,s,x, inv, prob, < Tg >s^s) is a non- 
empty finite or infinite sequence: 



(jj =< So, 0 >-^< So, 0 -I- (5( 



CTo.Ao.PO.T„o(po) „ , r <5i 

> — < si,0-h(5o[Ao :=0] 



„ r r^ r CTl ,Xi ,Pi ,T« j (p; 

< Si, 0 + (5o[Ao := 0] + (5i > — >■ 

< S2j 0 + (<^o[Ao := 0] -I- i5i)[Ai := 0] >— 



< S2j 0 + (<^o[Ao 0] -I- i5i)[Ai :— 0] -I- ^2 > 



0-2,A2,P2.Ts2 (P2) 



where Si G S, {ai, Xi,pi) G prob{si), Oi € S, Si € R" for alH > 0, s = sq. Xi Qx 
gives the set of clocks to be reset. 

We can also write w as follows: 

A.o-o.Ao.Po.TsoIpo) s, r\ . m ^ <5 i,o-i,Ai,pi,Tsj (pi) 



U) =< So, 0 > 



< Si, 0 + <^o[Ao :— 0] > 
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^ /c- c < 52 jO’ 2 ,-^ 2 jP 2 ,’"s 2 (Ps) 

< S2) 0 + (<5o[Ao := 0] + (5i)[Ai := 0] > — > 

We can also write oj as follows: 

n '^OT^^Oj'^OiPOj'^sn (po) 1 <5i,(Ti,Ai,pi,rs-. (pi) 

w=<so,a‘^> — <si,a^> — )> 

„ < 52 ,<T 2 ,A 2 ,P 2 ,Ts 2 (P 2 ) 

< S2,a^ > — )> 

where a° = 0 , a^ = 0 + <5o[Ao := 0], a^ = 0 + (<5o[Ao := 0] + 5i)[Ai := 0]. 

3.2 Parallel Composition of Probabilistic Timed Automata 

Next we define parallel composition of probabilistic timed automata. The key 
operator to build complex real-time systems from simpler ones is parallel com- 
position. 

Definition 7 (Parallel composition of probabilistic timed automata) 

GivenGi ={Si,Si,si,xi,invi,probi,<Tsi >sigSi) and Q-z = (S'2, A72, S2, X2, 
inv2,prob2,< Ts^ >s2GS2) ; parallel composition 0/ Gi and G2 is denoted by 
Gi II G2. We assume that \i and \2 are disjoint. G = {S, S,s,x,inv,prob, < 
Ts >ses) consists of the followings: 

1 . S = SixS 2 

2. U = A7i U Aj2 

3. S= (si,S2) 

4- X = Xi UX2 

5 . inv{{si, S2)) = invi{si) A inv2{s2), where for any Si € Si and S2 € S2 

6 . Both prob : S -A 2 ^^^ x^i(Sxr") ^ family of functions 

< 'T(si ,S 2 ) >(.i ,s2)eSixS2 defined as follows: 

(a) if a € El and a € E2: 

(a, Ai,pi) € probi(si) exists if a G Ei, and {a,\2,P2) G prob2{s2) exists 
if a G E2- We can define (a, Ai U A2,pi 0^2) G prob{{si, S2)), where 
Si G Si, S2 G S'2. Here we can define pi ®p2 by pi{si) xp2{s2). 
Moreover, we can define X(^si,s2) ■ P'>^ob{{si, S2)) -A assigning an 
enabling condition to each (a, Ai U A2,pi ® P2) G prob{{si, S2)), where 

Z^ = Z^^ X Z^2 ■ 

(b) if a G El and b G E2(a yf b): 

{a,Xi,pi) G probi(si) exists if a G Ei, and {b,\2,p2) G prob2{s2) if 
b G E2- We can define (a,Xi,pi) G prob{{si, S2)) and, we can define 
{b,X2,P2) G prob{{si,S2)), where Si G Si, S2 G S'2. 

Moreover, we can define X(^si,s2) ■ pro 6 ((si, S2)) -A Z^^j^ assigning an 
enabling condition to each (a, Ai,pi) G prob{{si, S2)) , and we can define 
T(si,s 2) ■ pi"ob{{si, S2)) -A Zp^2 assigning an enabling condition to each 
{b,X2,P2) Gprob{{si,S2)) ■ 

4 A Timed Simulation Relation 

In this section, we define a timed simulation relation between two probabilistic 
timed automata. 

First, we define a simulation relation between two probabilistic distributions 
borrowed from the reference [13]. 
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Definition 8 (A simulation relation between probability distributions) 

Let (po,pi) € R be a relation between two probability distributions po,Pi- For- 
mally, (po,pi) € R if there is a function w : Qo 'x Qi ^ [ 0 , 1 ] such that 

1 . ifw{qo,qi) > 0 then {qo,qi) G R, where qo G Qo and qi G Qi- 

2 . for each qo G Qo, E,ieQi ’^ido,qi) =Po{qo)- 

3 . for each qi G Qi, E,oeQo =-Pi( 9 i)- 

where Qo and Q\ are finite sets. The function w is called a weight function. ■ 

Next, we define a timed simulation relation by combining both a Segala’s 
simulation relation between probability distributions [13] and a Tasiran’s timed 
simulation relation [6] as follows: 

Definition 9 (A timed simulation relation) 

Given two probabilistic timed automata Gi = {S\, Si,^i,xi,i'nvi,probi, 

< Ts^ >sieSi) and G2 = {S2, S2,S2,X2Gnv2,prob2, < Ts^ a timed sim- 

ulation relation from Gi to G2 is a binary relation R C Hi x H2, denoted by 
Gi ^ G2, if the following three conditions are satisfied, where Hi is a set of 

< si,a > and H2 is a set of < S2,b >, si G Si, a : yi — >■ S2 G S2, 

h : X2 ^ 

1 . A timed simulation condition: 

for every (< si,a >, < S2,b >) G R, and for every S, a, Ai and Tsj(pi), if 

5 ,ct,Ai,Tsj (pi) 

<si,a> — <si/, a/> 
then there exists < S2L b/ > such that 



< S 2 ,b > 



5,CT,A2,Ts2 (p2) 



< S 2 L b/ > 



and (< Si/, a/ >, < S2/, b/ >) G R, where a-\-S ^ andh-\-S ^ Ts2{p2)> 

(a,Xi,pi) G prob{si) and (a,X2,P2) G prob{s2), af = (a + < 5 )[Ai := 0 ] and 

b/= (b + 5)[A2 :=0j. 

2 . A probability distribution simulation condition: 

for every (< si,a >, < S2,b >) G i? and every transition 

(5,ct,Ai,pi,Tsj (pi) 

<si,a> — <si/, a/> 



o/Gi, there exists a transition 



< S 2 ,b > 



S,a,\2,P2,'rao [pL, 



< S2/,b/ > 



0/G2 such that (pi,p2) G R. 

3 . An initial condition: 

for every < si ,0 >, there exists an initial state < S2,0 > with (< si ,0 >,< 
S 2 , 0 >)Gi?. ■ 
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5 Verification Method of a Timed Simulation Relation 

In this section, we propose automatic verification method of a timed simulation 
relation of probabilistic timed automata. We achieve this by converting this veri- 
fication to a finite check on the finitely many equivalence classes of an equivalence 
relation. 



5.1 Equivalence of Clock Valuations 

We first define an equivalence relation on the space of clock valuations in order 
to define region graphs in the next subsection. 

Definition 10 (Agreement with integral parts of clock valuations) 

For any t G R, [tj denotes the integral part oft. Then, for any t,tf G R, t and 
tf agree on their integral parts if and only if: 

1. [t\ = [U\ 

2. both t and tf are integers or neither is an integer. ■ 



Definition 11 (Clock equivalence) 

The valuations a, b G R^ are clock equivalence, denoted by a^h, if and only if 
they satisfy the following conditions: 

1. 'ixi G X, either a^ and agree on their integral parts, or both a^ > c and 
bi > c, and 

2. \/xi, Xj G x> either a^ — aj and hi — hj agree on their integral parts, or both 
ai — aj > c and hi — hj > c. 

where c = Cmax(G). Cmax(G) denotes the largest constant appearing in proba- 
bilistic timed automaton G. ■ 

Let [a] denote the equivalence class of = to which a belongs. We refer to 
elements such as < s, [a] > as regions. 



5.2 Region Graph 

We construct a region graph based on regions. We define a region graph [4] which 
captures both the probabilistic transitions and the movement to new regions due 
to the passage of time, which takes the form of a Markov decision process. 

First we define successor class and successor region in order to construct a 
region graph. 

Definition 12 (Successor class and successor region) 

Let a and (3 be distinct equivalence classes o/R^. We define successor class as 
follows: 
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The equivalence class (3 is said to be the successor of a if and only if, for each 
a € a, there exists a positive t G R" such that a + t € P and a + t/€ aU P 
for all tt < t. 

The successor relation can he extended to regions in the following way: 

< sf,P > is the successor region of < s, a > if st = s and P = succ(a) . ■ 

Next we define region graph of probabilistic timed automaton as Markov 
decision process. 

Definition 13 (Region graph) 

The region graph R(G) is defined to he the Markov decision process {V* , Steps*) 
corresponding to probabilistic timed automaton G = {S, S,s,x,inv,]jrob, 
< Ts >sgs), where V* is the set of regions, Steps* : V* — >■ Steps* : 

V* — >■ ) includes two types of transitions, where Si = S \J {succ}. For 

each region < s, a >G V* : 



1 . Passage of Time: 

if the invariant condition inv{s) is satisfied by succ{a), then pfffc G Steps* 
(< s,a >), where for any < si,P >€ V* : 



J 1 if < si, P >=< s,succ(a) > 
[ 0 otherwise. 



We denote < s,a si,P > if < si,P >=< s,succ(a) >. 

2 . Discrete Transitions: 

G Steps*{< s,a >) if there exists (a,X,p) G prob{s) and a satisfies 
Ts{p) such that for any a € S and equivalence class P: 

p®’“<s/,/ 3 >= ^ p{si,X). 

XQx and a.[X:—0]—f3 

where for any si € S and X C x, the probability will make a state transition 
node si, and reset all the clocks in X to 0 , given by p{si, X) . 

We denote this by < s, a > - — > < si, P >. ■ 



5.3 A Region Simulation Relation 

We first define a region simulation relation on region graph, secondly check a 
timed simulation by reducing the problem to a finite check on the equivalence 
classes of a relation defined on region graphs. 

Given two probabilistic timed automata Gi = (Si, S,si,xi,invi,P>robi, < 
Tsj >sieSi) and G2 = {S2, S ,S2,X2pnv2,prob2, < Ts^ >82^82), we define parallel 
composition Gi || G2 of Gi and G2 as G = {S,S,^,x,inv,prob,< Tg >ses)- 
We can also define R(Gi II G2). 

First, we require some preliminary definitions such as R(< Si, a >, < S 2 , b >) 
and Rgi||G2- 
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Definition 14 (R(< Si,a >, < S2,b >) and RgiUGs) 

We define R(< Si,a >, < S2,b >) anrf Rgi||G2 as follow s: 

1. With R(< Si,a >, < S2,b >), we denote the equivalence class that the state 
(< si,a >, < S2,b >) G riGi||G2 belongs to. 

2. Let Rgi||G2 be the set of equivalence classes of = on Gi || G2- ■ 



Definition 15 (A region simulation relation) 

A subset of equivalence classes rj C Rgi||G2 a region simulation relation from 
Gi to G2 iff for each R( < Si,a >, < S2,b >) G rj, the following three conditions 
are satisfied: 

1. for every S, a, Ai and Tsj{pi), if 

5,ct,Ai,Tsj (pi) 

<si,a> — <si/, a/> 

6,Cr,X2,Ts2 (P2) 

then < S2,b > — > < S2',b/ > for some < S2/, b/ > such that 

R(< si/, a/ >, < S2/, b/ >) G rj, where a + <5 ^ Ts^ipi) and b + 5 ^ Ts2{p 2), 
(a,Xi,pi) G prob{si) and (ct, A2,P2) G prob{s2), a/ = (a + < 5 )[Ai := 0 ] and 
b/= (b + ( 5 )[A 2 := 0 ], 

2. R(< Si,a >, < S2,b >) G ?7 and every transition 

(5,ct,Ai,pi,Tsj (pi) 

<si,a> — <si/, a/> 



o/Gi, there exists a transition 



< S 2 ,b > 



(5,CT,A2,P2,Ts2 (P2, 



< S2/,b/ > 



0/G2 such that (pi,p2) G p and R(< Si/,a/ >, < S2/, b/ >) G p. 

3. If Gi can wait for S in < si, a >, then G2 can wait for 6 in < S2,h >. ■ 



From the following theorem, there is an algorithm which, given two proba- 
bilistic timed automata Gi and G2, decides whether G2 simulates Gi or not. 

Theorem 1 (Timed simulation and region simulation) 

Given p C Rgi||G2; R-p = {(< < S2,b >)|R(< si,a >, < S2,b >) G 

p}. Rp is a timed simulation relation from Gi to G2 denoted by Gi ^ G2 tffp 
is a region simulation relation from Gi to G2- 



Proof 1 We simply show outlines of proof of the above theorem as follows: 



1. p is a region simulation relation i/Rp is a timed simulation relation: 
Assume that Rp is a timed simulation relation from Gi to G2- 
Let (< si,a >, < S2,b >) G Rp, (ct, Ai,p) G prob{{si, S2)) and T(^si,s2){p) be 



such that < si, a > 



<5,cr,Ai,p,T(a,^_32)(p) 



< Si/, a/ >. As Rp is a timed simulation 

5,cr,A2,p,r(sj,s2)(p) 



relation, some < S2L b/ > exists such as < S2, b > — ’ < S2>, b/ >. 

This < S2L b/ > satisfying conditions of a region simulation relation. 

From the above, p is a region simulation relation. 
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2. is a timed simulation relation if rj is a region simulation relation: 

Assume that rj is a region simulation relation. Let (< Si,a >, < S 2 ,b > 

(pi) 



) G Rp and let < si,a > 



< si/, a/ > for some S, a, Ai, p\ 



and Tsj(pi). We need to show that there exists < S 2 /, b/ > such that < 
S 2 ,b> — >• < S 2 ', b/ > anrf (< Si/, a/ >, < S 2 ', b/ >) G R^. 

5,(7,A,P,Ts2 ( p ) 

By the fact that rj is a region simulation relation, < S 2 ,b > — > < 

S 2 /, b/ > for some < S 2 /, b/ > such that R(< Si/, b/ >, < S 2 /, b/ >) G rj, 
where (pi,p) G rj, and if Gi can wait for 6 in < si, a > then G 2 can wait 
for S in < S 2 ,h >. 

<5,(T,A2,P2)'’'s2 (Ps) 

We claim that < S 2 , b > — < S 2 L b/ > which will imply the desired 

result. Here, we define P 2 = p, a + i5 ^ Tg^ipi), b + <5 |= Ts 2 {p 2 ), (ct, Ai,pi) G 
proh{si), (cr, \ 2 ,P 2 ) G prob{s 2 ), a/ = (a + 5)[Ai := 0], b/ = (b + (5)[A2 := 0]. 
Moreover, for every < si,0 >, there exists an initial state < S2,0 > with 
(< Si,0 >,< S2,0 >) G Rp. 

From the above result, we can show (< si/, a/ >, < S 2 /,b/ >) G Rp. 

From the above, Rp is a timed simulation relation. ■ 



Thus, the problem of checking whether a probabilistic timed automaton Gi 
timed-simulates another probabilistic timed automaton G 2 can be reduced to 
computing the region simulation relation over the equivalence classes Rcji||G 2 - 
For this purpose, any of the existing algorithms for computing simulation [14] 
can be adopted to obtain an algorithm. 



6 Application of Our Method 

to Stepwise Refinement Development 

In general, we develop real-time systems with discrete probability distributions 
by stepwisely refining them. That is, we refine abstract specification into concrete 
specification. In this case, proving both trace distribution inclusion and timed 
trace inclusion are difficult [1,13]. However, from our proposed method in this 
paper, if we can show that each move of a probabilistic timed automaton Gi 
can be simulated by a probabilistic timed automaton G 2 , then we can conclude 
that each timed trace distribution of Gi is also a timed trace distribution of G 2 . 
This idea is known as the simulation method [15] and constitutes a very useful 
technique for the analysis of real-time and distributed systems [6,9]. 

Therefore, in this section, we apply our proposed method to stepwise de- 
velopment of real-time systems. Moreover, we consider internal behaviors and 
propose timed weak simulation such as process algebra [15]. We propose stepwise 
refinement using timed weak simulation as follows: 

1. First, we refine abstract specification into concrete specification by adding 
internal behaviors. 

2. Secondly, we consider concrete specification should be weakly simulated by 
abstract specification. That is, concrete specification should be designed such 
that concrete specification is contained in abstract specification. 
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( 1 ) probabi 1 X St i c timed automaton G1 



(2) probabi 1 i s t i c limed aulomalon G2 



Fig. 1. A probabilistic timed simulation relation 



3. Consider Figure 1. According to previous timed simulation relation, there 
does not exist a timed simulation relation from Gi to G 2 - But Gi moves 
from s2t and s3/ to s4/ with probability 1.0, and moves from s2t and s3/ to s5/ 
with probability 1.0. On the other hand, G 2 moves from s2// and s3// to s4// 
with probability 1.0, and moves from s2tt and s3// to s5// with probability 1.0. 
If we consider practical problems, it is better that there exists a probabilistic 
timed simulation relation from Gi to G 2 by extending a timed simulation 
relation. 

From these results, we propose a probabilistic timed simulation relation, and 
apply this relation to stepwise refinement development of real-time systems. 



6.1 A Probabilistic Timed Weak Simulation Relation 



We stepwisely refine abstract specification into concrete specification using a 
probabilistic timed weak simulation relation. We consider both a combined tran- 
sition and an internal event int. Intuitionally, there exists a probabilistic timed 
weak simulation relation from Gi to G 2 in Figure 2, where Gi is concrete 
specification, G 2 is abstract specification. 

First we define both a combined transition and an internal transition in order 
to show that Gi is simulated by G 2 . 



Definition 16 (A combined transition) 

Let < s, a > be a state of a probabilistie timed automaton G. We say that 






<s,a>J2^ 

a eonvex eombination of the set 



C< Sj,a^ > is a combined transition of < s,a> if p is 



^ r I S,a-,\,pj ,T,(pj) 

C = {pi\ < s,a > — > < sfa! > and Pj G Pi{< si, a! >)| 

of distributions, that is, for each pi, there is a nonnegative real number Wi such 
that = ^ o-nd p = Wi* Li- ■ 
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Fig. 2. A probabilistic timed weak simulation relation 



Definition 17 (An internal transition) 

(5,cr,A,p,Tsj^ (p) 

There exists a transition < Si,ai > < s„,a„ > if the following tran- 

sition occurs: 

(pi) S2,int,\2,P2,ri,^ (p2) 

< Si, ai > > < 32,3.2 > 1 

■ . ■ —t < S j + 1 , 3j ^1 > A . . . 

(p„_i) 

■ ■ ■ ^ Sn , a^ > 

where <5 = <5i + . . . + S„-i, p = pi 0 . . . 0pn-i, A = Ai U . . . A„_i. ■ 



Finally we define a probabilistic timed weak simulation relation. 

Definition 18 (A probabilistic timed weak simulation relation) 

Given two probabilistic timed automata Gi = {Si,Si,Si,xi,'ln-Vi,probi, 
< Tsj >siGSi) and G2 = {82,^2, S2,X2,inv2,prob2,< Ts^ >s2gsj, a prob- 
abilistic timed weak simulation relation from Gi to G2 is a binary relation 
i? C I 7 i X 1 ? 2 ; denoted by Gi A G2, if the following conditions (both a timed 
simulation condition and an initial condition) are satisfied: 

where f?i is a set of < Si,a > and Q2 is a set of < S2,b >, si € 
a : xi — >■ S2 S S2, h : X2 ^ Gi is concrete specification, G2 is 

abstract specification, Ei and E2 contain int. 
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1. A probabilistic timed weak simulation eondition: 

for every (< si,a >, < S 2 ,b >) G R, and for every 5,a,Xi,Ts^{pj), if 



< si, a > 



E 



SI {Pj 



C< 



> 



then there exists < S 2 h b/ > such that 



< S2,b > 



S,a,\2,Tg^ (pfe) 



c< Sk, b*^ > 



and (< Sj,a^ >, < Sfc, b'^) G R , where a + (5 ^ (pj) and b + <5 ^ (pk), 

(a,Xi,pj) G prob{si) and (a,X 2 ,Pk) G pvob{s 2 ), a^ = (a + 5)[Ai := 0] and 
= (b + (5)[A2 :=0], 

2. A probability distrinution simulation eondition: 

for every (< si,a >, < S 2 ,b >) G i? and every transition 



< si, a > 






C< Si,' 



o/Gi, there exists a transition 



< S2,b > 



, <5,o-,A2, Pfc,T 33 (pk) 

k 



c< Sk,h^ > 



0 /G 2 such that (pj,pk) G R and (< sj,al >,< Sk,h^ >) G R. 

3. An initial condition: 

for every < si,0 >, there exists an initial state < S 2,0 > with (< si,0 >,< 
S2,0>)Gi?. ■ 



6.2 Example of Stepwise Refinement 

In this section, we show simple example of real-time systems. We use Ethernet 
CSMA/CD protocol [16] consisting of senders and receivers. Here we will specify 
a concrete sender and an abstract sender, and verify whether there exists a prob- 
abilistic timed weak simulation relation from concrete specification to abstract 
specification. 

In Figure 3(1) (abstract specification), abstract specification moves from si — >■ 
s2 in order to send data. If abstract specification moves from s2 — >■ s4 (that is, 
if we start data send), it moves from s4 — >■ s6 (during data send), and finally 
it moves from s6 — >■ si. If it moves from s5 — >■ s7( if lines are busy), it moves 
from s5 — >■ s7 (that is, we try to send data again). In abstract specification, 
it nondeterministically moves from si — >■ s2 or from si — >■ s3. Moreover, it 
probabilistically moves from s2 — >■ s4 or from s2 — >■ s5. 

In Figure 3(2) (concrete specification), we refine abstract specification by 
adding an internal behavior such that it moves from s6/ — >■ s7/, and by adding 
another internal behavior such that it moves from s8/ — >■ s9/. 

There exists a probabilistic timed weak simulation relation from concrete 
specification to abstract specification clearly. 
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Fig. 3. Example of stepwise refinement 



7 Conclusion 

In this paper, we define a timed simulation relation between probabilistic timed 
automata, and define the automatic verification method. Moreover, we define a 
probabilistic timed weak simulation relation, and apply it to stepwise refinement 
of real-time systems. We are now working for both implementing an effective 
verification system and applying our proposed methods to real problems. 
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Abstract. Protocols for distributed systems make often use of random 
transitions to achieve a common goal. A popular example are randomized 
leader election protocols. We introduce probabilistic product automata 
(PPA) as a natural model for this kind of systems. To reason about 
these systems, we propose to use a product version of linear temporal 
logic (LTL®). The main result of the paper is a model-checking procedure 
for PPA and LTL®. With its help, it is possible to check qualitative 
properties of distributed systems automatically. 



1 Introduction 

Randomization techniques have been employed to solve numerous problems of 
computing both sequentially and in parallel. Examples of probabilistic algo- 
rithms that are asymptotically better than their deterministic counterparts in 
solving various fundamental problems abound. It has also been shown that they 
allow solutions of problems which cannot be solved deterministically [LR81]. 
They have the advantages of simplicity and better performance both in theory 
and often in practice. Prominent examples for distributed randomized algorithms 
are randomized leader election protocols [BGJ99]. An overview of the domain of 
randomized algorithms is given in [PRRROl]. 

As for any kind of hardware or software system, it is important to develop 
formal methods and tools for verifying their correctness, thus also for probabilis- 
tic programs. Model checking^ introduced independently in [CES83] and [QS82], 
turned out to be one fruitful approach to automatically verify systems (see 
[CW96] for an overview of model checking in practice). In the model-checking 
approach, usually a finite system Ad, often an abstraction of a real system, and 
a property, usually expressed as a temporal-logic formula ip or as an automa- 
ton describing the computations that adhere to the property, are given. The 
model- checking procedure decides whether the set or tree formed of all compu- 
tations of M satisfies p, or, in other words, whether the given system satisfies 
the required property. Temporal logics used for expressing requirements are usu- 
ally linear temporal logics (LTL) (as initially proposed by Pnueli in [Pnu77]) or 
branching-time logics like computation-tree logics CTL and CTL*. 

* This author is supported by the European Research Training Network “Games”. 
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In the probabilistic setting, one is no longer interested in all but only al- 
most all computations. Thus, (sets or paths of trees of) computations with zero 
probability are ignored when deciding whether a given property is satisfied. 

Temporal logics for probabilistic programs were initially studied in [LS82]and 
[HS84]. [Var85]introduced the notion of Concurrent Probabilistic Programs 
and provided a procedure for checking their LTL properties. 

In this paper, we introduce Probabilistic Product Automata (PPA) as a model 
for distributed probabilistic systems. Roughly, a PPA is a parallel product of 
CPPs. They are enriched, however, with actions to synchronize the concurrent 
execution. In the non-randomized setting, product automata have proven to be 
a reasonable model for distributed systems. To support their analysis, a product 
version of linear temporal logic (LTL®) has been defined. It allows the definition 
of properties for each process, i.e., each single automaton or CPP of the overall 
system. The properties for each process are then combined by means of Boolean 
connectives to a single formula. Model-checking procedures for LTL® and prod- 
uct automata have been studied in [Thi95]. We show in this paper, how LTL® 
properties of PPA can be checked automatically. 

Our method follows the automata-theoretic approach: Given a PPA A, we 
can construct a single concurrent probabilistic program M with the same prob- 
abilistic behavior. For a property ip expressed as an LTL® formula, we consider 
its negation and construct a (non-randomized) product automaton A^ip that 
captures the behavior violating ip. This automaton is transformed into a deter- 
ministic Streett automaton The common behavior of M. and represents 
the behavior of the underlying system that does not satisfy our property. If the 
behavior is empty in a probabistic sense, p is satisfied by A. 

We define a kind of intersection of M. and B^^p and provide simple graph 
algorithms that answer the question whether M. and B^p have a common be- 
havior, in the probabilistic sense mentioned above. Our procedure is quadratic 
in the size of M. (which grows exponentially with the number of parallel CPPs) 
and double exponential in size of p. 

Often, distributed systems are modelled as a single nondeterministic one. 
While our model-checking procedure makes use of this idea, our approach start- 
ing with a distributed system has several advantages. Firstly, it is more direct 
to model a distributed protocol as a PPA rather than as a single system. Sec- 
ondly, it is more direct to formulate properties of the distributed system on a 
“per process basis” as done in LTL® rather than for a system capturing the 
interleaved behavior. Thirdly, performance benefits can be expected in practice, 
since our procedure works on-the-fly: The main question to answer is whether 
the intersection of Ai and B^^ has empty behavior. If a single behavior is found, 
the model checking procedure can stop and provide this counter example. Since 
the combined system can be checked in a top-down manner, often only a part 
of this system has to be constructed. This implies that only a part of M. and a 
part of B^p has to be constructed. Since M and B^^ are (double) exponentially 
larger than the underlying structures A and p, this has a considerable effect in 
practice. 
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This paper is organized as follows. In the next section, we introduce the nec- 
essary concepts and notation of words, graphs, and automata. Section 3 presents 
our model for distributed probabilistic systems, probabilistic product automata. 
LTL® is defined in Section 4. The main contribution of the paper, the model 
checking algorithm for PPA and LTL® is explained in Section 5. We sum up our 
approach and mention directions for future work in the last section. 

2 Preliminaries 

Given an alphabet S, S* denotes the set of finite and the set of infinite 
words over A, respectively. Furthermore, let = S* U be the set oiwords. 
For a word <j = ai 02 ... € and a natural i > 1, let cr* denote aiai+i . . ., 
the ith suffix of cr, and let a{i) denote ai, the ith letter of a. Furthermore, take 
inf (a) as the infinity set of cr, i.e., the set of letters that occur infinitely often in 
a. For a word a = ai . . . On G E* and a natural i G {!,..., n-l- 1}, we accordingly 
define cr® to be the word a' with ai . . . Ui-ia' = a (thus, cr"’’'^ is the empty word 
e) and, if i < n, a(i) to be at. 

Given a word cr and an alphabet E, let cr 1. A denote the word we get by 
erasing from cr the letters that are not contained in E. 

Given a (directed) graph G with nodes V and edges E, we call a node v gV 
(a set D G- V of nodes) reachable from v' G V if there is a path from v' to v 
(to a node contained in Z?). A strongly connected component (SGG) of G is a 
maximal set D of nodes such that every two nodes contained in D are reachable 
from each other. A SGG is called bottom if there is no edge leading out of it. We 
define G to be strongly connected if V forms a SGG. Furthermore, G is called 
nontrivial if it contains at least one edge. A set D CV is said to be nontrivial if 
G[D], the subgraph of G induced by D, is nontrivial. The size of G, denoted by 
|G|, is defined to be \V\ + \E\. 

Let E be an alphabet. An extended Streett automaton over A is a tuple 
A= {S, Sq, S, F,G) where S is its nonempty finite set of states, So C S is the set 
of initial states, ^ : S' x A — > 2'^ is its transition function, E C S is its set affinal 
states, and is a subset of (2'^)^. Arun of A on a word cr = 0102 ... € A“ (on a 
word a = ai ... Gn G E*) is a sequence of states p = S 0 S 1 S 2 . . . G S“ (a sequence 
of states p = sqSi . . . Sn G S*) such that sq G Sq and, for each natural i (for each 
i G {0, 1, . . . , IpI — 1}), Si+i G 6(si, Oi+i). We call p accepting if either a is finite 
and S|(^| e F or cr is infinite and, for all pairs {U, V) G Q, inf{p) n C/ yf 0 implies 
inf{p) n P yf 0. The language of A, denoted by L{A), is the set {cr G E°° \ 
there is an accepting run of A on cr}. A is called deterministic if both |So| = 1 
and |i5(s, a)| = 1 for all s G S, a G E. Furthermore, the size |A| of A is defined 
to be the size of its (transition) graph. An extended Biichi automaton over E is 
just an extended Streett automaton, but it employs an acceptance component 
ECS instead of A run over an infinite word is henceforth accepting if it 
visits at least one state from T infinitely often. 

Given an extended Biichi automaton A over an alphabet E, there is a deter- 
ministic extended Streett automaton B over E such that L{B) = L{A) [GTW02]. 
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3 Probabilistic Product Automata 

Before we present our model for distributed concurrent probabilistic programs, 
we define the notion of its building blocks, the concurrent probabilistic programs. 
Let us describe the systems we want to study intuitively first. Figure 1 shows 
two concurrent probabilistic programs. The system starts in a nondeterministic 
state, shown as a circle, and selects nondeterministically a randomizing state, 
represented as a box. The transitions are labelled by actions which will later 
be used to synchronize the parallel execution of several concurrent probabilistic 
programs. In a randomizing state, the system chooses a nondeterministic suc- 
cessor state according to the probabilities of the outgoing arcs. Let us be more 
precise: 

Definition 1 (Concurrent Probabilistic Program). A concurrent proba- 
bilistic program (CPP) over an alphabet S is a tuple M = {Q, N, R, A, P,Q™) 
where 

— N and R are disjoint nonempty finite sets o/ nondeterministic and random- 
izing states, respectively, and Q = N \J R is the set o/ states, 

— ACNxExR is the set o/ transitions, 

~ P : i? X iV — > [0, 1] is the transition probability distribution^ such that, for 
each qG R, J^q'eN 

— Q®” C Q is the set o/ initial states. 

A nondeterministic state q G N is called enabled in A4, if has outgoing 
transitions, i.e. if there is a, q' G R and an a G £ such that {q,a,q') G A. The 
set of enabled states of A4 is denoted by Nj^. 

Sequences of transitions of a CPP involve actions as well as random choices. 
To be able to handle both kinds of transitions in the same manner, we use 
the symbol p to denote a random move and set Sp := A U {p}. To study the 
probabilistic behavior of a CPP, we consider its possible random executions, 
when fixing the nondeterministic choices by means of a scheduler. Given a partial 
execution of the system, i.e., a sequence of states and actions (including p), 
ending in an enabled nondeterministic state, a scheduler tells us which action 
and successor state to choose: A scheduler of a CPP M = {Q, N, R, A, P,Q'‘‘^) 
over A is a mapping u : {QEp)*N^ E x R such that, for each x G {QEp)* 
and q G , u(xq) = {a,q') implies {q,a,q') G A. 

For the rest of the paper, we fix a natural K > 1, the set Proc = {!,..., K} 
of processes, and a distributed alphabet E = {Ei, . . . , Ek), a tuple of (not 
necessarily disjoint) alphabets E^. We use IJj Ei as a shorthand for UieProc 
For a G IJ ■ Ei, let loc{a) := {* G Proc \ a G Ei} be the set of processes that the 
action a participates in. 

We are now ready to define our model for distributed concurrent probabilistic 
programs: 



^ We usually write P(q,q') instead of P{(q,q')). 
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Definition 2 (Probabilistic Product Automaton). A probabilistic product 
automaton (PPA) over E is a structure A = {{Ai)i^proc, S™) such that 

— for each i G Proc, Ai is a CPP {Qi, Ni, Ri, Ai, Pi) (without set of initial 
states) over Ei and 

— S™ C HiGProc Qi global initial states. 

The two CPPs shown in Figure 1 form together a PPA over ({o, 6}, {6, c}) 
when setting the initial state of the system to (po, 9o)- Note that the probability 
distributions Pi are reflected by transition arcs in case of nonzero transition 
probabilities, respectively. 




Fig. 1. A probabilistic product automaton 



A PPA A = ((A)jGPr-oc, S'™), Ai = (Qj, Aj, i?i, Ai, Pi), determines the CPP 
Ma = {Qa, Na, Ra, Aa, Pa, Qa) over E, where 

~ Qa = Y\i^Proc Qi 

(in the following, for q = {qi, . . . , qx) G Qa and i G Proc, let q[i] denote qi) 

~ ^A = riig Ptoc 

- Ra = Qa \ Na, 

- (9,a,^) G Aa if 

• (g[i], a,^[i]) G Ai for all i G loc{a) and 

• q[i] = q'[i] for all i ^ loc{a), 

- = S'™, and 

{ Yli&Proc, q[i\eRi PiiWWlii-]) for each i G Proc, 

q[i\ G N, implies q[i\ = q'[i\ 

0 otherwise 

It is easy to verify that Ma is indeed a CPP. 

The PPA from Figure 1 induces the CPP given by Figure 2. 

Let us now recall the probabilistic setting needed to reason about probabilis- 
tic programs. 

A nonempty set of possible outcomes of an experiment of chance is called 
sample space. Let 17 be a sample space. A set 18 C 2^ is called Borel field over 17 
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if it contains Q, Q\E for each if G iB, and the union of any countable sequence 
of sets from *8. A Borel field iB is generated by an at most countable set f , 
denoted by iB = (f), if iB is the closure of f’s elements under complement and 
countable union. 

A probability space is a triple VS = (17, *B,/i) where 17 is a sample space, 
iB is a Borel field over 17, and ^ is a mapping iB — *■ [0, 1] such that /r(l7) = 1 
and Ei) = M(-E'i) for any sequence ifi,if 2 , ... of pairwise disjoint 

sets from 25. We call p, a probability measure. An event if G 25 is said to occur 
almost surely if p{E) = 1. 

A scheduler m of a CPP M. = {Q,N,R,A,P,Q™) over an alphabet S induces 
a probability space VSm,u = Mm,u) as follows: 

- the sample space consists all infinite and maximal finite sequences of states 
and actions (including p) that respect the transition relation and can be 
stepwise taken with non-zero probability. Furthermore, the nondetermin- 
istic transitions must follow the scheduler. We only consider a finite se- 
quence when it ends in a state with no outgoing transitions: Let Qm,u = 
{qiaiq 2 a 2 . . • G (QApj^ | qi G Q™ and, for all i > I, {qi & R and qi+i G N 
and Qi = p and P{qi, qi+i) > 0) or {qi G and Ui G E and u{qiai . . .qi) = 
(ai,gi+i))} U {qiai . . . 9 „_ia„_ig„ G {QEp)*{N \ N^) \ qi G and, for 
alH G {1, . . . , n — 1}, {qi G R and qi+i G N and Ui = p and P{qi, qi+i) > 0) 
or {qi G and u{q\ai . . .qi) = (a^, gi+i))} be the set of trajectories of M 
wrt. u, 

- ^M,u = {{Cm.u{x) I X G {QEp)*N}) where Cm,u{x) = {x' G f^M,u | a: is a 
prefix of x'j is the basic cylinder set of x wrt. M. and u, and 

- is uniquely given by pm,u{^^m,u) = 1 and, for n > 1, 

,u {Sj\ 4 ^u {qi^l • ■ • qn— 1 ^n—iqn)) — (Q1 7 ^2 ) ' • ■ ■ ' P {qn— 1 7 ^n) 
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where 






P{qi,q^+i) 

1 if Qi £ N 



4 Product LTL 

In this section, we recall the definition of a product version of linear temporal 
logic, denoted by LTL®, as defined in [MD97]. It is based on Pnueli’s LTL. How- 
ever, since we deal with action-based systems, we provide an action-based version 
of the logic as well. Furthermore, we extend the definition in a straightforward 
manner to deal with finite and infinite sequences. 

Let S be an alphabet. The set LTL(i7) of Linear Temporal Logic (LTL) 
formulas over E is given by the following grammar: 

if ::= tt I I V v ?2 I (a)T’ (a € if) | (pUtp 
An LTL (if) formula is inductively interpreted over cr € E°° as follows: 

— a |=i: tt 

— cr \=s if cr 7? 

— cr \=s (p\J if a \=s 7 ? or cr \=z; ip 

— cr \=s {a)(fi if cr £, cr(l) = a, and cr^ \=s 

— cr \=s (fiUip if there is an i > 0 such that cr* |=i; ip and, for each j £ 

{!,..., i- 1}, cr^ |=i: ip 

The language of a formula ip £ LTL (if), denoted by L{ip), is defined to be 
the set {a £ E°° \ a \=s ip}- Note that one can construct an extended Biichi 
automaton over if such that L{B) = L{ip) [VW86]. 

Product LTL formulas are Boolean combinations of LTL formulas, each for- 
mulated for a single process. More specifically, the set LTL® (if) of Product 
Linear Temporal Logic (LTL®) formulas over E is given as: 

ip ■-:= [ipi]i {ipi £ LTL(ifj)) \ ipi W ip2 \ ipi f\ ip2 
An LTL® (if) formula is inductively interpreted over cr £ (IJ^ ifi)°° as follows: 

cr 1=^ If cr \,Ei Ipi 

— a \=(g) ip y Ip if a \=^ ip or a Ip 

— a ip /\ Ip if a \=^ ip and <J ip 

Thus, given a system, we restrict the run to the actions “interesting” for the 
process i and take the usual semantics of the LTL formula. 

Note that we did not introduce negation on the outer LTL® level since it can 
be “pushed inwards” to each single LTL(ifi) formula. 

One might be tempted to understand an LTL® formula as an LTL formula 
over the alphabet E = [J^ Ei (abstracting away []i). But it is easy to see that 
this yields a different semantics. 

The language of a formula ip £ LTL® (if), denoted by L{ip), is defined to be 
the set {cr £ (IJ ■ Ei)°° \ a 1=® ip}. According to the LTL case, we can construct 
a PPA A over if such that L{A) = L{ip) and, from A, build an extended Biichi 
automaton B over IJ^ Ei with L{B) = L{A) = Lipp) [MD97]. 
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5 Probabilistic Model Checking 

In this section, we clarify the notion when a PPA satisfies an LTL® formula 
in a probabilistic sense. Furthermore, we provide an algorithm answering this 
question. 



5.1 Satisfaction 

For checking whether a PPA satisfies a formula, we first define the set of se- 
quences following a given scheduler and satisfying a given formula or being ac- 
cepted by an automaton (let S be an alphabet): 

1. For a CPP M. over A, a scheduler u of At, and a formula (p G LTL(A), let 

G I ^ I A/ • 

2. For a PPA A over S, a scheduler u of and a formula p G LTL® (A), let 

■— G I ^ I (LJi ^}’ 

3. For a CPP M over A, a scheduler u of M, and an extended Streett (Biichi) 
automaton B over A, let Lm,u{B) := {x G f^M,u \x[S G L{B)}. 

We can show, using a simple induction, that these sets are measurable: 

Proposition 1. Let A be an alphabet. 

1. Given a CPP M. over A and a formula p G LTL(A), we have Lm,u{p) G 
‘^M.u for each scheduler u of M.. 

2. Given a PPA A over A and a formula <p G LTL®(A), it holds Ly[^u{p) G 

each scheduler u of M.j(. 

3. Given a CPP A4 over A and an extended Streett (Biichi) automaton B over 
A, we have Lm,u{B) G ^m.u for each scheduler u of M.. 

We can now define the satisfaction relation for CPPs and LTL, PPA and 
LTL®, as well as CPPs and Streett automata. 

Definition 3. Let A be an alphabet. 

1. A CPP A4 over A is said to satisfy a formula p G LTL(A) if, for all 
schedulers u of A4, P.m,u{Lm,u{p)) = 1- 

2. A PPA A over A is said to satisfy a formula p G LTL® (A) if, for all 
schedulers u of Ma, LMa,u{.La,u{.p)) = 1- 

3. Given a CPP A4 over A and an extended Streett (Biichi) automaton B over 
A, A4 is said to satisfy B if, for all schedulers u of M., Pm,u(Lm,u{B)) = 1. 

For example, the formula [0(a)(6)tt]i A [0(c)(&)tt]2 € LTL® (({a, 6}, {6, c})) 
is satisfied by the PPA from Figure 1. 

A logic that specifies properties of a product system should not differentiate 
between different linearizations of its parallel execution, a well-known require- 
ment in the domain of Mazurkiewicz traces [Leu02] . Let us check that this is the 
case for our notion of satisfaction of PPA and LTL®. 
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For two words cr, cr' € (IJ^ ^i)* , we say that they aie equivalent (with respect 
to E) and write a ^ a' ii a [ Ei = a' [ Si for all i S Proc. In other words, two 
words are equivalent if they only differ in the ordering of independent actions. 
We say that two actions are independent if they are not member of a single Ei 
for one i G Proc . 

As already a simple product automaton, a PPA is robust with respect to 
the order in which independent actions are executed. To illustrate this, let us 
consider the PPA from Figure 1 . Both components can independently execute 
the independent actions a and c whereupon a random move follows, respectively. 
Such independence is reflected in the global system (cf. Figure 2 ): Starting from 
the initial state {po, go), constituting ac as the order in which a and c are executed 
spans the same probability space wrt. all the possible nondeterministic successor 
states (po, 9 o), (Po,92), {p2,qo), and (^2,92) as constituting ca. 



5.2 The Algorithm 

Model checking PPA against LTL® formulas is the problem whether a given 
LTL® formula is satisfied by a given PPA. As we will show in this subsection, 
we can reduce this problem to the question whether the language defined by a 
scheduler for a Streett automaton has positive measure with respect to a given 
CPP. Therefore, we study this problem first: 

Let E be an alphabet, M = (Q, A, R, A, P, Q*") be a CPP over E, and B = 
(S', So, S, F, Q) be an extended Streett automaton over E. Theproduct of Ad and 
B, the CPP Mm,b = {Qm,b, Rm,b, ^M,B, Pm,b,Qm.b^ 

(with acceptance condition) over E, is given as follows: 



^ Qm,B = Q X S 
— Nm,b = N X S 
~ R-m.b = R X S 



- {{q, s), O, (9', s')) e Am.b if (9, a, q') € A and s' G 



PM,B{{q,s), (9', s')) = 



_/P( 9 , 9 ') ifs = s' 



I 0 



if s yf s' 



QZb = X ^0 

Fm’b = {N \ N^) X F 

Qm.b = {{{Q X U), (Q X V)) I ([/, P) G 0 } 



S(s, a) 



We want to mark some SCCs of A4 m.B to be good in some sense and call a 
set D of its states accepting if, for all pairs {U, V) G Gm,B, (9, s) G D with s GU 
implies (9', s') G D for some 9' and s' G V. Otherwise, D is called rejecting. We 
say that a state (r, /) of a rejecting set D isrejecting if there is a pair {U, V) G G 
such that f G U and D contains no state (9, s) with s G V. 

Theorem 1. For a CPP A4 = {Q,N,R,A,P,Q’'^) over an alphabet E and a 
deterministic extended Streett automaton B = {S,{sq},S,F,G) over E, there is 
a scheduler u of A4 with P,m,u{Lm,u{B)) > 0 ijf 

— there is a path in (the graph of) Mm.b from an initial state to a final state 
from Fm.B or 
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— there is a set D of states of ^ satisfying the following: 

(1) A4m,b[D] is nontrivial and strongly eonnected, 

(2) D is accepting and reachable from a state of Q™ x {sq}, and 

(3) for all transitions {{q, s), {q' , s')) G Am,B with (q,s) G D and {q',s') ^ 
D, {q,s) is nondeterministic, i.e., it holds (<?, s) G Nm,B (or, equiva- 
lently, q G N). 

Proof. (<i=) Suppose there is a path (3 G {Qm.b)* through A4m,B from an ini- 
tial state to a state (q,s) of Fm,b- K is easy to see that then a corresponding 
scheduler (simply following the path) forces M.m,B to visit {q,s) with nonzero 
probability. Otherwise, fix a path /3 G {Qm,b)* through M.m,B from an initial 
state to a state of D. Let (3' be the projection of (3 onto the first component. 
The scheduler u of M' satisfying p.m,u{Lm,u{I3)) > 0 follows f3' taking Mm,b 
from the initial state to D and, henceforth, forces the trajectory both to stay 
within D and to almost surely visit each state of D infinitely often. This can be 
accomplished by, for a given nondeterministic state (<?, s), alternately choosing 
the transitions {q,s) {q',s') of A4m,B with {q',s') G D (recall that the his- 

tory of a trajectory is at the scheduler’s disposal.) Clearly, tiM,u{CM,u{(3')) is 
nonzero. Given Cm,u{P'), the conditional probability that A4, wrt. u, follows a 
trajectory that visits each state of D infinitely often is one. As such a trajectory 
is contained in Lm,u{^), we conclude IJ-m,u{Lm,u{13)) > 0. 

(=J>) Note that a trajectory x of A4 wrt. u unambiguously defines a path 
X through Mm.B starting from an initial state. This is due to the fact that 
B is deterministic. Let V contain the subsets D of states of Mm,B such that 
Mm.b[D] is strongly connected. Furthermore, for D £ T>, let E{D) := {x G 
I infifx) = D}. Now suppose that tiM,u{LM,u{B)) > 0 for a scheduler u of 
A4. If u leads A4m.B from an initial state into a final state from we are 

done. Otherwise, as 



LmAB) = U E{D), 

D^T> is accepting 

we can find an accepting set D £ T> that satisfies P^m,u{E{D)) > 0. (Otherwise, 
the probability of the countable union Lm,u{B) of events would be zero.) As D is 
the infinity set of at least one infinite path through A4m,B starting from an initial 
state, it forms a nontrivial (strongly connected) subgraph of A4m,B, satisfying 
condition (1). Now suppose there is a transition (q,s) {q',s') of Mm,B with 

(q,s) £ D, {q',s') ^ D, and q £ R. As, for every trajectory x £ E{D), x visits 
(q,s) infinitely often (and each time the probability to exit D is nonzero), it 
will almost surely leave D infinitely often so that we have IJ-m,u{E{D)) = 0 
contradicting our assumption. It follows that D also satisfies condition (3) from 
Proposition 1, which concludes our proof. □ 

Note that, in the above proof, we explicitly make use of the fact that a 
trajectory of Af determines exactly one corresponding run of A4m,B starting 
from an initial state (recall that B is deterministic) . 
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Table 1. Model checking LTL® specihcations of PPA 

Given a PPA A over E and a formula ip G LTL®(A'). 

Goal: Decide whether, for all schedulers u of AIa, it holds 
~ 1- 

Solution: 

1. From A, we construct the GPP Ma, and from ip, we construct 
the deterministic extended Streett automaton with 
L{B-.^) = L{-<p). 

2. Compute (the graph of) remove those states that are 

not reachable from an initial state of , and let G denote 

the resulting graph. 

3. Repeat 

(a) Determine the sets AC of nontrivial and accepting and TZC of 
nontrivial and rejecting SCCs of G, respectively. 

(b) For each G € TZC, remove the transitions going out from 
rejecting states. 

(c) For each G € AC, do the following: 

i. Find the set H of states (g, s) £ C with randomizing q 
from where there is a transition leaving G. 

ii. If H is the empty set, then return “No”. Otherwise, 
remove the states of H and corresponding transitions 
from G. 

until AC U TZC = 0. 

4. Test whether a scheduler can force from an initial state 

into a final state with probability greater than 0, i.e., whether 
there is a path from an initial state of Mma,b^,,, to a state in 
Fma,b^,p- If this is the case, return “No”. Otherwise, return “Yes”. 



Based on Theorem 1, we now provide an algorithm that solves the model- 
checking problem for PPA, i.e., it decides for a given PPA A over S and a formula 
(p G LTL®(i7) whether, for all schedulers u of Ma, fJ'MA,u{LA,u{^)) = 1 (namely 
iff there is no scheduler u of Ma such that pma,u{La,u{^T^)) > 0). The algorithm 
is shown in Table 1. In the first step, the given PPA is transformed into a CPP. 
The given formula is negated and translated into a product automaton accepting 
the models of the formula. This step is described in [MD97] and is omitted 
here. It is straightforward to translate a product automaton into an extended 
Biichi automaton, which again can be translated into a deterministic extended 
Streett automaton [GTW02]. In the second step, we combine the obtained CPP 
and Streett automaton into a single system. The characterization provided in 
Theorem 1 is used in items 3 and 4 to answer the model checking question. 
Obviously, the algorithm terminates. Furthermore, it returns the answer “No” 
iff there is a scheduler u of Ma such that Pi-Ma,u{Lma,u{B^ip)) > 0- 

To simplify our presentation, we described the algorithm in a stepwise man- 
ner. It is clear that steps 1 and 2 can be done on demand by steps 3 and 4. Thus, 
we can get an on-the-fly procedure. 
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Furthermore, the algorithm can easily be adapted to answer the model- 
checking problem for LTL or Biichi-automata specifications. Only step 1 has 
to be adjusted to produce a Streett automaton for a given LTL or Biichi au- 
tomaton. 

Let us discuss the complexity of our algorithm. Starting from a Biichi au- 
tomaton B with n states, construct an equivalent deterministic extended Streett 
automaton B' with states and 0(n) pairs in the acceptance compo- 

nent. Say our CPP A4 has m states. The number of states of A4m,B' is not 
greater than m • Thus, steps (a), (b), and (c) are repeated at most 

m ■ 2‘^*^"'*°s”)-times, respectively. Determining the SCCs of G can be done in 
time linear in the size of A4m.B'- Overall, the algorithm (modified for CPPs and 
Biichi automata) runs in time 0{m^ ■ 2<^("'*°s")), i.e., it is quadratic in \M.\ and 
exponential in \B\. 

Proposition 2. Given a CPP A4 and a Biichi automaton B, it can he decided 
in time whether fJ.M,u{LM,u{B))>0 for some scheduler u of M. 

Translating an LTL® formula into a product automaton is of exponential 
complexity with respect to the length of the formula. The product automaton 
gives rise to a Biichi automaton of same order^. Thus, translating an LTL® 
formula into a deterministic extended Streett automaton is of double exponential 
complexity. Together with Proposition 2, we get 

Theorem 2. Given a PPA A and an LTL® formula (p, checking whether A 
satisfies ip can be done in time polynomial in the size of A and double exponential 
in the size of <p. 

6 Conclusion and Future Work 

In this paper, we presented probabilistic product automata (PPA) as a model 
for distributed probabilistic programs. It is based on the well-known model of 
product automata, but extended by random transitions. Thus, a probabilistic 
product automaton is a product of probabilistic systems that run in parallel and 
synchronize by common actions. Every probabilistic system is able to do labelled 
nondeterministic and as well as randomized transitions. 

For the product version of linear temporal logic LTL® , originally defined for 
product automata, we extended the notion of satisfaction to the probabilistic 
setting. Intuitively, we say that a PPA satisfies an LTL® formula, if for every 
scheduler (that fixes the nondeterministic choices of the system) almost all runs 
satisfy the given formula. 

The main contribution of the paper is a procedure that automatically answers 
the question whether a given PPA satisfies a given LTL® formula. This problem 
is also known as the model- checking problem. 

^ Note that a Biichi automaton corresponding to a product automaton grows expo- 
nentially in the number of components. Fixing the number of components, however, 
it grows polynomially with respect to the size of the components. 
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Our procedure is automata-based and can be implemented on-the-fly, which 
often provides good run-time behavior in practice, despite high worst-case com- 
plexity. 

Additionally, we get a procedure for checking automata specifications and 
LTL specifications of PPA. 

It would be interesting to extend our work to the setting of fair executions 
of the product system and, more specifically, while we check satisfaction for all 
schedulers, to see whether the restriction to fair schedulers gives different results. 

Furthermore, it would be interesting to see whether techniques as used in 
[CY95] can improve our procedure to get single exponential complexity with 
respect to the length of the formula. 
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